bq: ..._version_ will change on updates" , shouldnt that be OK....

Absolutely not OK. Lucene/Solr relies on the uniqueKey being
identical to define different documents. So if you update a doc
it _must_ have the same uniqueKey or it gets added as a
completely new document in addition to the old one. Having the
_version_ field change on you when you update docs (and
this is _not_ under your control) seems... fraught.

Net-net is then you have two visible copies of the same document.
Not good.

You must have something you can use as a uniqueKey. You say
"If I do a look up based on URL , I am bound to face issues with
character escaping and all"

How do you propose to correlate the UUID field to the URL for
your lookups anyway?

 You say:
"To avoid that I was using a UUID for look up , but in SolrCloud it
generates unique per replica , which is not acceptable"

Why not?

The whole _point_ of UUIDs is that they're, well, unique (or at
least very close) no matter where/when they're created so why is
it a problem to generate them on different replicas (NOT as the
uniqueKey however)?

But you still have to make the UUID <-> URL connction, where is
that being handled?

All in all, it seems like you're making this much more difficult than
it needs to be and would be well-served by
1> learning to escape the URLs
or
2> massaging the URL to something more consumable and living
with what might be very occasional duplication
or
3> generate your own UUID on a single machine during indexing
and inject that into the record (not with UUIDProcesor..., just
the Java class assuming your ingestion is Java based).
or
4> trusting the UUID generation code will keep UUIDs that
are automatically generated on different machines unique enough for
practical purposes.


Best,
Erick

On Thu, Nov 13, 2014 at 12:06 PM, Michael Della Bitta
<michael.della.bi...@appinions.com> wrote:
> You could also find a natural key that doesn't look like an ID and create a
> name-based (Type 3) UUID out of it, with something like Java's
> nameUUIDFromBytes:
>
> https://docs.oracle.com/javase/7/docs/api/java/util/UUID.html#nameUUIDFromBytes%28byte%5B%5D%29
>
> Implementations of this exist in other languages as well.
>
>
> On 11/13/14 11:35, Shawn Heisey wrote:
>>
>> On 11/12/2014 10:45 PM, S.L wrote:
>>>
>>> We know that _version_field is a mandatory field in solrcloud schema.xml,
>>> it is expected to be of type long , it also seems to have unique value in
>>> a
>>> collection.
>>>
>>> However the query of the form
>>>
>>> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:1484632548944380000%29&wt=json
>>> does not seems to return any record , can we query on the _version_field
>>> in
>>> the schema.xml ?
>>
>> I've been watching your journey unfold on the mailing list.  The whole
>> thing seems like an XY problem.
>>
>> If I'm reading everything correctly, you want to have a unique ID value
>> that can serve as the uniqueKey, as well as a way to quickly look up a
>> single document in Solr.
>>
>> Is there one part of the URL that serves as a unique identifier that
>> doesn't contain special characters?  It seems insane that you would not
>> have a unique ID value for every entity in your system that is composed
>> of only "regular" characters.
>>
>> Assuming that such an ID exists (and is likely used as one piece of that
>> doctorURL that you mentioned) ... if you can extract that ID value into
>> its own field (either in your indexing code or a custom update
>> processor), you could use that for both uniqueKey and single-document
>> lookups.  Having that kind of information in your index seems like a
>> generally good idea.
>>
>> Thanks,
>> Shawn
>>
>

Reply via email to