bq: ..._version_ will change on updates" , shouldnt that be OK.... Absolutely not OK. Lucene/Solr relies on the uniqueKey being identical to define different documents. So if you update a doc it _must_ have the same uniqueKey or it gets added as a completely new document in addition to the old one. Having the _version_ field change on you when you update docs (and this is _not_ under your control) seems... fraught.
Net-net is then you have two visible copies of the same document. Not good. You must have something you can use as a uniqueKey. You say "If I do a look up based on URL , I am bound to face issues with character escaping and all" How do you propose to correlate the UUID field to the URL for your lookups anyway? You say: "To avoid that I was using a UUID for look up , but in SolrCloud it generates unique per replica , which is not acceptable" Why not? The whole _point_ of UUIDs is that they're, well, unique (or at least very close) no matter where/when they're created so why is it a problem to generate them on different replicas (NOT as the uniqueKey however)? But you still have to make the UUID <-> URL connction, where is that being handled? All in all, it seems like you're making this much more difficult than it needs to be and would be well-served by 1> learning to escape the URLs or 2> massaging the URL to something more consumable and living with what might be very occasional duplication or 3> generate your own UUID on a single machine during indexing and inject that into the record (not with UUIDProcesor..., just the Java class assuming your ingestion is Java based). or 4> trusting the UUID generation code will keep UUIDs that are automatically generated on different machines unique enough for practical purposes. Best, Erick On Thu, Nov 13, 2014 at 12:06 PM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > You could also find a natural key that doesn't look like an ID and create a > name-based (Type 3) UUID out of it, with something like Java's > nameUUIDFromBytes: > > https://docs.oracle.com/javase/7/docs/api/java/util/UUID.html#nameUUIDFromBytes%28byte%5B%5D%29 > > Implementations of this exist in other languages as well. > > > On 11/13/14 11:35, Shawn Heisey wrote: >> >> On 11/12/2014 10:45 PM, S.L wrote: >>> >>> We know that _version_field is a mandatory field in solrcloud schema.xml, >>> it is expected to be of type long , it also seems to have unique value in >>> a >>> collection. >>> >>> However the query of the form >>> >>> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:1484632548944380000%29&wt=json >>> does not seems to return any record , can we query on the _version_field >>> in >>> the schema.xml ? >> >> I've been watching your journey unfold on the mailing list. The whole >> thing seems like an XY problem. >> >> If I'm reading everything correctly, you want to have a unique ID value >> that can serve as the uniqueKey, as well as a way to quickly look up a >> single document in Solr. >> >> Is there one part of the URL that serves as a unique identifier that >> doesn't contain special characters? It seems insane that you would not >> have a unique ID value for every entity in your system that is composed >> of only "regular" characters. >> >> Assuming that such an ID exists (and is likely used as one piece of that >> doctorURL that you mentioned) ... if you can extract that ID value into >> its own field (either in your indexing code or a custom update >> processor), you could use that for both uniqueKey and single-document >> lookups. Having that kind of information in your index seems like a >> generally good idea. >> >> Thanks, >> Shawn >> >