the problem on CDCR of solrCloud

2017-05-22 Thread 魏晓峰
hello,my name is weixiaofeng. I'm from China, I'm a java developer.Recently we 
use the technology of solr  to complete search big data.we were in trouble in 
module CDCR(Cross Data Center Replication) of solrCloud。

The goal of the project is to replicate data to multiple Data Centers to 
support Near Real Time Searching by immediately forwarding updates between 
nodes in the cluster on a per-shard basis.
 
but it doesn't work .I don't know how to solve this question.I'm very anxious 
about it.so can you help me? thank you!

Re: Joins using graph queries - solr 6.0

2017-05-22 Thread mganeshs
Hi, Sorry that this reply is not an answer for your post, but want to know
whether graph is working fine for you as expected. is that traverse working
fine in the graph ?

I posted a question over here,
http://lucene.472066.n3.nabble.com/Graph-traversel-td4331207.html#a4331799

but no response. 

So just curious whether graph works for you and can you share me your sample
data and query you use to traverse the graph?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-using-graph-queries-solr-6-0-tp4336214p4336455.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Join not working in Solr 6.5

2017-05-22 Thread mganeshs
Thanks for bringing up performance perspective. Is there any bench mark on
join performance when number of shards is more than 10 where documents are
indexed based on router.field.

Are you suggesting instead of router.field go for streaming expressions or
use join with router.field and then go for streaming expressions ? Can you
detail out pls ?

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-not-working-in-Solr-6-5-tp4336247p4336451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: different length/size of unique 'id' field value in a collection.

2017-05-22 Thread Derek Poh

Hi Rick

Myapologies I didnot make myself clearon the value of the fields. There 
are numbers.
I used 'ts1', 'sup1' and 'pdt1' for simplicity and for ease of 
understanding instead of the actual numbers.


You mentioned this design has the potential for (in error cases) 
concatenating id's incorrectly. Could explain more on this?


On 5/22/2017 6:12 PM, Rick Leir wrote:

On 2017-05-22 02:25 AM, Derek Poh wrote:

Hi

Due to the source data structure, I need to concatenate the values of 
2 fields ('supplier_id' and 'product_id') to form the unique 'id' of 
each document.
However there are cases where some documents only have 'supplier_id' 
field.
This will result in some documents with a longer/larger 'id' field 
(have both 'supplier_id' and 'product_id') and some with a 
shorter/smaller 'id' field value (has only 'supplier_id').


Please refer to simplified representation of the records below.
3rd record only has supplier id .
ts1 sup1 pdt1
ts1 sup1 pdt2
ts1 sup2
ts1 sup3 pdt3
ts1 sup4 pdt5
ts1 sup4 pdt6

I understand the unique 'id' is use during indexing to check whether 
a document already exists. Create if it does not exists else update 
if it exists.


Are there any implications if the unique 'id' field value is of 
different size/length among documents of a collection?

No

Is it advisable to have such design?

Derek
You need unique ID's. This design has the potential for (in error 
cases) concatenating id's incorrectly. It might be better to have ID's 
which are just a number. That said, my current project has ID's which 
are not just a number, YMMV.

cheers -- Rick


Derek






--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Rule-based Replica Placement not working with Solr 6.5.1

2017-05-22 Thread Damien Kamerman
If you want all the replicas for shard1 on the same port then I think the
rule is: 'shard:shard1,replica:port:8983'

On 22 May 2017 at 18:47, Bernd Fehling 
wrote:

> I tried many settings with "Rule-based Replica Placement" on Solr 6.5.1
> and came to the conclusion that it is not working at all.
>
> My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each server).
>
> The call to create a new collection is
> "http://localhost:8983/solr/admin/collections?action=CREATE=boss;
> collection.configName=boss_configs=3=2&
> maxShardsPerNode=1=shard:shard1,replica:<2,port:8983"
>
> With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1 has
> only nodes with port 8983 _OR_ it shoud fail due to "strict mode" because
> the fuzzy operator "~" it not set.
>
> The result of the call is:
> shard1 --> server2:7574 / server1:8983
> shard2 --> server1:7574 / server3:8983
> shard3 --> server2:8983 / server3:7574
>
> The expected result should be (at least!!!) shard1 --> server_x:8983 /
> server_y:8983
> where "_x" and "_y" can be anything between 1 and 3 but must be different.
>
> I think the problem is somewhere in "class ReplicaAssigner" with
> "tryAllPermutations"
> and "tryAPermutationOfRules".
>
> Regards
> Bernd
>


Re: solrcloud replicas not in sync

2017-05-22 Thread Erick Erickson
You can ping individual replicas by addressing to a specific replica
and setting distrib=false, something like

 
http://SOLR_NODE:port/solr/collection1_shard1_replica1/query?distrib=false=..

But one thing to check first is that you've committed. I'd:
1> turn off indexing on the source cluster.
2> wait until the CDCR had caught up (if necessary).
3> issue a hard commit on the target
4> _then_ see if the counts were what is expected.

Due to the fact that autocommit settings can fire at different clock
times even for replicas on the same shard, it's easier to track
whether it's a transient issue. The other thing I've seen people do is
have a timestamp on the docs set to NOW (there's an update processor
that can do this). Then when you check for consistency you can use
fq=timestamp:[* TO NOW - (some interval significantly longer than your
autocommit interval)].

bq: Is there a way to recover when a shard has inconsistent replicas.
If I use the delete replica API call to delete one of them and then use add
replica to create it from scratch will it auto-populate from the other
replica in the shard?

Yes. Whenever you ADDREPLICA it'll catch itself up from the leader
before becoming active. It'll have to copy the _entire_ index from the
leader, so you'll see network traffic spike.

Best,
Erick

On Mon, May 22, 2017 at 1:41 PM, Webster Homer  wrote:
> I have a solrcloud collection with 2 shards and 4 replicas. The replicas
> for shard 1 have different numbers of records, so different queries will
> return different numbers of records.
>
> I am not certain how this occurred, it happened in a collection that was a
> cdcr target.
>
> Is there a way to limit a search to a specific replica of a shard? We want
> to understand the differences
>
> Is there a way to recover when a shard has inconsistent replicas.
> If I use the delete replica API call to delete one of them and then use add
> replica to create it from scratch will it auto-populate from the other
> replica in the shard?
>
> Thanks,
> Webster
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


solrcloud replicas not in sync

2017-05-22 Thread Webster Homer
I have a solrcloud collection with 2 shards and 4 replicas. The replicas
for shard 1 have different numbers of records, so different queries will
return different numbers of records.

I am not certain how this occurred, it happened in a collection that was a
cdcr target.

Is there a way to limit a search to a specific replica of a shard? We want
to understand the differences

Is there a way to recover when a shard has inconsistent replicas.
If I use the delete replica API call to delete one of them and then use add
replica to create it from scratch will it auto-populate from the other
replica in the shard?

Thanks,
Webster

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Indexing word with plus sign

2017-05-22 Thread Rick Leir
Fundera,
You need a regex which matches a '+' with non-blank chars before and after. It 
should not replace a  '+' preceded by white space, that is important in Solr. 
This is not a perfect solution, but might improve matters for you.
Cheers -- Rick

On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
 wrote:
>Thank you Zahid and Erik,
>
>I was going to try the CharFilter suggestion, but then I doubted. I see
>the indexing process, and how the appearance of 'i+d' would be handled,
>but, what happens at query time? If I use the same filter, I could
>remove '+' chars that are added by the user to identify compulsory
>tokens in the search results, couldn't I?  However, if i do not use the
>CharFilter I would not be able to match the 'i+d' search tokens...
>
>Thanks all!
>
>
>
>El 22/05/17 a las 16:39, Erick Erickson escribió:
>
>You can also use any of the other tokenizers. WhitespaceTokenizer for
>instance. There are a couple that use regular expressions. Etc. See:
>https://cwiki.apache.org/confluence/display/solr/Tokenizers
>
>Each one has it's considerations. WhitespaceTokenizer won't, for
>instance, separate out punctuation so you might then have to use a
>filter to remove those. Regex's can be tricky to get right ;). Etc
>
>Best,
>Erick
>
>On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
>
>wrote:
>
>
>Hi,
>
>
>Before applying tokenizer, you can replace your special symbols with
>some
>phrase to preserve it and after tokenized you can replace it back.
>
>For example:
>replacement="xxx" />
>
>
>Thanks,
>Zahid iqbal
>
>On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
>funderadevelo...@outlook.com>
>wrote:
>
>
>
>Hi all,
>
>I am a bit stuck at a problem that I feel must be easy to solve. In
>Spanish it is usual to find the term 'i+d'. We are working with Solr
>5.5,
>and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
>the
>index documents both in Spanish and Catalan, and in Catalan it is
>frequent
>to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
>documents as results.
>
>I have tried to use the SynonymFilter, with something like:
>
>i+d => investigacionYdesarrollo
>
>But it does not seem to change anything.
>
>Is there a way I could set an exception to the Tokenizer so that it
>does
>not split this word?
>
>Thanks in advance!

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Indexing word with plus sign

2017-05-22 Thread Fundera Developer
Thank you Zahid and Erik,

I was going to try the CharFilter suggestion, but then I doubted. I see the 
indexing process, and how the appearance of 'i+d' would be handled, but, what 
happens at query time? If I use the same filter, I could remove '+' chars that 
are added by the user to identify compulsory tokens in the search results, 
couldn't I?  However, if i do not use the CharFilter I would not be able to 
match the 'i+d' search tokens...

Thanks all!



El 22/05/17 a las 16:39, Erick Erickson escribió:

You can also use any of the other tokenizers. WhitespaceTokenizer for
instance. There are a couple that use regular expressions. Etc. See:
https://cwiki.apache.org/confluence/display/solr/Tokenizers

Each one has it's considerations. WhitespaceTokenizer won't, for
instance, separate out punctuation so you might then have to use a
filter to remove those. Regex's can be tricky to get right ;). Etc

Best,
Erick

On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
 
wrote:


Hi,


Before applying tokenizer, you can replace your special symbols with some
phrase to preserve it and after tokenized you can replace it back.

For example:



Thanks,
Zahid iqbal

On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
funderadevelo...@outlook.com> wrote:



Hi all,

I am a bit stuck at a problem that I feel must be easy to solve. In
Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5,
and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the
index documents both in Spanish and Catalan, and in Catalan it is frequent
to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
documents as results.

I have tried to use the SynonymFilter, with something like:

i+d => investigacionYdesarrollo

But it does not seem to change anything.

Is there a way I could set an exception to the Tokenizer so that it does
not split this word?

Thanks in advance!






without termfeq - returning the number of terms/or regex of terms in a document

2017-05-22 Thread Saman Rasheed
i have an english book which i have indexed its contents successfully into 
field called 'content, with the following properties:





so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should contain 2 and give me 'Solomon' with a term frequency = 2.


I've tried going through the term vector section in the reference and various 
other posts on the internet but still i havent managed to figure out how.


the nearest i found is the following syntax/way:


http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]=true=true=true


which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term frequency of every term! but i only need the term frequency of 
particular pattern/regex:


is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find soothing, somthing, everything each with their number of occurences 
for the document?


thanks,



Re: cursorMark value causes Request-URI Too Long excpetion

2017-05-22 Thread Chris Hostetter

: I've been using cursorMark for quite a while, but I noticed that sometimes
: the value is huge (more than 8K). It results in Request-URI Too Long

FWIW: cursorMark values are simple "string safe" encoded forms of sort 
fields -- so my guess is you are sorting on some really long string 
values?

general speeaking, indepdent of cursorMark, you might want to see if you 
can normalize/truncate some of the string fields you are sorting ... 
should improve distributed sort performance overall.



-Hoss
http://www.lucidworks.com/


Re: LukeRequestHandler not returning all fields in the index

2017-05-22 Thread Yago Riveiro
Ok ... then I have no way to know the full list of fields in my collection
without doing a LukeRequest to all of the shards and do a merge in the end,
isn't it?

Streaming expressions doesn't allow * wildcard, the LukeRequest doesn't
return all fields .. no way to pull all data from a collection in a
programatic simple way :/

Thanks for the answer Erick. 



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/LukeRequestHandler-not-returning-all-fields-in-the-index-tp4336287p4336332.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Join not working in Solr 6.5

2017-05-22 Thread Erick Erickson
this will likely be "interesting" from a performance perspective. You
might try Streaming, especially StreamingExpressions and ParallelSQL
depending on what you need this for.

Best,
Erick

On Mon, May 22, 2017 at 12:05 AM, Damien Kamerman  wrote:
> I use a router.field so docs that I join from/to are always in the same
> shard.  See
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting
>
> There is an open ticket SOLR-8297
> https://issues.apache.org/jira/browse/SOLR-8297 Allow join query over 2
> sharded collections: enhance functionality and exception handling
>
>
>
> On 22 May 2017 at 16:01, mganeshs  wrote:
>
>> Is there any possibility of supporting joins across multiple shards in near
>> future ? How to achieve the join when our data is spread-ed across multiple
>> shards. This is very much mandatory when we need to scale out.
>>
>> Any workarounds if out-of-box possibility is not there ?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Join-not-working-in-Solr-6-5-tp4336247p4336256.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: Solr Analyzer for Vietnamese

2017-05-22 Thread Erick Erickson
Eirik:

That code is 4 years old and for Lucene 4. I doubt it applies cleanly
to the current code base, but feel free to give it a try but it's not
guaranteed.

I know of no other Vietnamese analyzers available.

Dat is active in the community, don't know whether he has plans to
update/commit that bit of code.

Best,
Erick

On Mon, May 22, 2017 at 12:25 AM, Eirik Hungnes
 wrote:
> Hi,
>
> There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
> to Lucene at the moment. Does anyone know if something like this exists
> today or is planned for? We found this
> https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
> if it's up to date. Any info highly appreciated!
>
> Thanks,
>
> Eirik


Re: LukeRequestHandler not returning all fields in the index

2017-05-22 Thread Erick Erickson
Luke really doesn't operate at a level that knows about collections
and the like, see: https://issues.apache.org/jira/browse/SOLR-8127.

So far there hasn't been much interest in extending it to the
collection level particularly because it's intended to get you
low-level index characteristics.

Not really a bug since it's never been intended to do what you're
asking, but could certainly be an improvement.

Best,
Erick

On Mon, May 22, 2017 at 4:50 AM, Yago Riveiro  wrote:
> I'm struggle with a situation that I think can be a bug
>
> The LukeRequestHandler is not returning all fields that exists in one
> collection with 12 shards on 12 nodes (1 shard on each node)
>
> Running this request "http://localhost:8983/solr/collection/admin/luke; in
> each node the list of fields are the same except one. The different is that
> exists one document on one shard with dynamic fields that doesn't exists in
> other shards.
>
> It's this the normal behaviour or should return all fields in all shards?
>
> Regards
>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/LukeRequestHandler-not-returning-all-fields-in-the-index-tp4336287.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing word with plus sign

2017-05-22 Thread Erick Erickson
You can also use any of the other tokenizers. WhitespaceTokenizer for
instance. There are a couple that use regular expressions. Etc. See:
https://cwiki.apache.org/confluence/display/solr/Tokenizers

Each one has it's considerations. WhitespaceTokenizer won't, for
instance, separate out punctuation so you might then have to use a
filter to remove those. Regex's can be tricky to get right ;). Etc

Best,
Erick

On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
 wrote:
> Hi,
>
>
> Before applying tokenizer, you can replace your special symbols with some
> phrase to preserve it and after tokenized you can replace it back.
>
> For example:
>  replacement="xxx" />
>
>
> Thanks,
> Zahid iqbal
>
> On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
> funderadevelo...@outlook.com> wrote:
>
>> Hi all,
>>
>> I am a bit stuck at a problem that I feel must be easy to solve. In
>> Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5,
>> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the
>> index documents both in Spanish and Catalan, and in Catalan it is frequent
>> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
>> documents as results.
>>
>> I have tried to use the SynonymFilter, with something like:
>>
>> i+d => investigacionYdesarrollo
>>
>> But it does not seem to change anything.
>>
>> Is there a way I could set an exception to the Tokenizer so that it does
>> not split this word?
>>
>> Thanks in advance!
>>
>>


Re: Nested Document is flattened even with @Field(child = true) annotation

2017-05-22 Thread Mikhail Khludnev
Hello!
Since you are talking about Banana, you might be interested in faceting.
You probably can have child docs in results and facets them, but this gives
child level counts. If you need to have parent level counts by child fields
you have two ways to do so: see
http://blog-archive.griddynamics.com/2016/03/block-join-faceting-implementation.html

Overall, exposing nested docs in Banana is exciting research. Let us know
how it goes.

On Mon, May 22, 2017 at 4:39 PM, biplobbiswas  wrote:

> Rick Leir-2 wrote
> > Yes! And the join queries get complicated. Yonick has some good blogs on
> > this.
> >
> > On May 19, 2017 11:05:52 AM EDT, biplobbiswas 
>
> > revolutionisme+solr@
>
> >  wrote:
> >>Wait, if I understand correctly, the documents would be indexed like
> >>that but
> >>we can get back the document as nested if we perform the
> >>blockjoinqueryparsing?
> >>
> >>So if I query normally with the default parser I would get all
> >>documents
> >>separately?
> >>Did i understand correctly?
> >>
> >>Thanks & regards
> >>Biplob
> >>
> >>
> >>
> >>--
> >>View this message in context:
> >>http://lucene.472066.n3.nabble.com/Nested-Document-is-
> flattened-even-with-Field-child-true-annotation-tp4335877p4335911.html
> >>Sent from the Solr - User mailing list archive at Nabble.com.
> >
> > --
> > Sorry for being brief. Alternate email is rickleir at yahoo dot com
>
>
> Thanks a lot for that reply.
>
> Now I am trying to integrate the results on to a banana  dashboard. Is
> there
> any way to visualize the _childDocuments_ as well? Currently its part of a
> the _childDocuments_ and represents itself in a json format, I can't
> generate any statistics on top of the nested documents.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Nested-Document-is-flattened-even-with-Field-
> child-true-annotation-tp4335877p4336296.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev


Re: HttpSolrCall CollectionRequest collection name

2017-05-22 Thread Colm O hEigeartaigh
Just a single Solr instance.

Colm.

On Mon, May 22, 2017 at 2:47 PM, Susheel Kumar 
wrote:

> Hi Colm,  what do you mean when you refer to "Standalone Solr".  Did you
> setup Solr Cloud or just a single Solr instance?
>
> Thnx
>
> On Mon, May 22, 2017 at 8:50 AM, Colm O hEigeartaigh 
> wrote:
>
>> Hi all,
>>
>> Setup: Standalone Solr with a "demo" collection name
>>
>> In HttpSolrCall.getAuthCtx - it is creating a CollectionRequest where
>> collectionRequest.collectionName == null when I call:
>>
>> curl http://localhost:8983/solr/demo/get?id=xyz
>>
>> Is there a reason why it doesn't try to extract the collection name from
>> the path in HttpSolrCall.getAuthCtx for this scenario? If I have a custom
>> authorization plugin, how can I see what the requested collection is in
>> this case?
>>
>> Apologies in advance if I am missing something obvious here.
>>
>> Thanks,
>>
>> Colm.
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com


Re: HttpSolrCall CollectionRequest collection name

2017-05-22 Thread Susheel Kumar
Hi Colm,  what do you mean when you refer to "Standalone Solr".  Did you
setup Solr Cloud or just a single Solr instance?

Thnx

On Mon, May 22, 2017 at 8:50 AM, Colm O hEigeartaigh 
wrote:

> Hi all,
>
> Setup: Standalone Solr with a "demo" collection name
>
> In HttpSolrCall.getAuthCtx - it is creating a CollectionRequest where
> collectionRequest.collectionName == null when I call:
>
> curl http://localhost:8983/solr/demo/get?id=xyz
>
> Is there a reason why it doesn't try to extract the collection name from
> the path in HttpSolrCall.getAuthCtx for this scenario? If I have a custom
> authorization plugin, how can I see what the requested collection is in
> this case?
>
> Apologies in advance if I am missing something obvious here.
>
> Thanks,
>
> Colm.
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>


Re: Nested Document is flattened even with @Field(child = true) annotation

2017-05-22 Thread biplobbiswas
Rick Leir-2 wrote
> Yes! And the join queries get complicated. Yonick has some good blogs on
> this.
> 
> On May 19, 2017 11:05:52 AM EDT, biplobbiswas 

> revolutionisme+solr@

>  wrote:
>>Wait, if I understand correctly, the documents would be indexed like
>>that but
>>we can get back the document as nested if we perform the
>>blockjoinqueryparsing? 
>>
>>So if I query normally with the default parser I would get all
>>documents
>>separately? 
>>Did i understand correctly? 
>>
>>Thanks & regards
>>Biplob
>>
>>
>>
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/Nested-Document-is-flattened-even-with-Field-child-true-annotation-tp4335877p4335911.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Thanks a lot for that reply.

Now I am trying to integrate the results on to a banana  dashboard. Is there
any way to visualize the _childDocuments_ as well? Currently its part of a
the _childDocuments_ and represents itself in a json format, I can't
generate any statistics on top of the nested documents.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-Document-is-flattened-even-with-Field-child-true-annotation-tp4335877p4336296.html
Sent from the Solr - User mailing list archive at Nabble.com.


HttpSolrCall CollectionRequest collection name

2017-05-22 Thread Colm O hEigeartaigh
Hi all,

Setup: Standalone Solr with a "demo" collection name

In HttpSolrCall.getAuthCtx - it is creating a CollectionRequest where
collectionRequest.collectionName == null when I call:

curl http://localhost:8983/solr/demo/get?id=xyz

Is there a reason why it doesn't try to extract the collection name from
the path in HttpSolrCall.getAuthCtx for this scenario? If I have a custom
authorization plugin, how can I see what the requested collection is in
this case?

Apologies in advance if I am missing something obvious here.

Thanks,

Colm.

-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com


Re: Indexing word with plus sign

2017-05-22 Thread Muhammad Zahid Iqbal
Hi,


Before applying tokenizer, you can replace your special symbols with some
phrase to preserve it and after tokenized you can replace it back.

For example:



Thanks,
Zahid iqbal

On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
funderadevelo...@outlook.com> wrote:

> Hi all,
>
> I am a bit stuck at a problem that I feel must be easy to solve. In
> Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5,
> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the
> index documents both in Spanish and Catalan, and in Catalan it is frequent
> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
> documents as results.
>
> I have tried to use the SynonymFilter, with something like:
>
> i+d => investigacionYdesarrollo
>
> But it does not seem to change anything.
>
> Is there a way I could set an exception to the Tokenizer so that it does
> not split this word?
>
> Thanks in advance!
>
>


LukeRequestHandler not returning all fields in the index

2017-05-22 Thread Yago Riveiro
I'm struggle with a situation that I think can be a bug

The LukeRequestHandler is not returning all fields that exists in one
collection with 12 shards on 12 nodes (1 shard on each node)

Running this request "http://localhost:8983/solr/collection/admin/luke; in
each node the list of fields are the same except one. The different is that
exists one document on one shard with dynamic fields that doesn't exists in
other shards.

It's this the normal behaviour or should return all fields in all shards?

Regards



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/LukeRequestHandler-not-returning-all-fields-in-the-index-tp4336287.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multiple Solr configuration / how to check setting

2017-05-22 Thread Arvind
Dear ,


Please can you guide can we config. multiple solr master server ? 

if someone confige then we are able to check that configuration in solr
admin dashboard ?

if solr multiple server configuration we complete then how its sync. with
index? 

thanks,
Arvind 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Solr-configuration-how-to-check-setting-tp4336278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: different length/size of unique 'id' field value in a collection.

2017-05-22 Thread Rick Leir

On 2017-05-22 02:25 AM, Derek Poh wrote:

Hi

Due to the source data structure, I need to concatenate the values of 
2 fields ('supplier_id' and 'product_id') to form the unique 'id' of 
each document.
However there are cases where some documents only have 'supplier_id' 
field.
This will result in some documents with a longer/larger 'id' field 
(have both 'supplier_id' and 'product_id') and some with a 
shorter/smaller 'id' field value (has only 'supplier_id').


Please refer to simplified representation of the records below.
3rd record only has supplier id .
ts1 sup1 pdt1
ts1 sup1 pdt2
ts1 sup2
ts1 sup3 pdt3
ts1 sup4 pdt5
ts1 sup4 pdt6

I understand the unique 'id' is use during indexing to check whether a 
document already exists. Create if it does not exists else update if 
it exists.


Are there any implications if the unique 'id' field value is of 
different size/length among documents of a collection?

No

Is it advisable to have such design?

Derek
You need unique ID's. This design has the potential for (in error cases) 
concatenating id's incorrectly. It might be better to have ID's which 
are just a number. That said, my current project has ID's which are not 
just a number, YMMV.

cheers -- Rick


Derek




Rule-based Replica Placement not working with Solr 6.5.1

2017-05-22 Thread Bernd Fehling
I tried many settings with "Rule-based Replica Placement" on Solr 6.5.1
and came to the conclusion that it is not working at all.

My test setup is 6 nodes on 3 servers (port 8983 and 7574 on each server).

The call to create a new collection is
"http://localhost:8983/solr/admin/collections?action=CREATE=boss;
collection.configName=boss_configs=3=2&
maxShardsPerNode=1=shard:shard1,replica:<2,port:8983"

With "rule=shard:shard1,replica:<2,port:8983" I expect that shard1 has
only nodes with port 8983 _OR_ it shoud fail due to "strict mode" because
the fuzzy operator "~" it not set.

The result of the call is:
shard1 --> server2:7574 / server1:8983
shard2 --> server1:7574 / server3:8983
shard3 --> server2:8983 / server3:7574

The expected result should be (at least!!!) shard1 --> server_x:8983 / 
server_y:8983
where "_x" and "_y" can be anything between 1 and 3 but must be different.

I think the problem is somewhere in "class ReplicaAssigner" with 
"tryAllPermutations"
and "tryAPermutationOfRules".

Regards
Bernd


Solr Analyzer for Vietnamese

2017-05-22 Thread Eirik Hungnes
Hi,

There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
to Lucene at the moment. Does anyone know if something like this exists
today or is planned for? We found this
https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
if it's up to date. Any info highly appreciated!

Thanks,

Eirik


Re: Join not working in Solr 6.5

2017-05-22 Thread Damien Kamerman
I use a router.field so docs that I join from/to are always in the same
shard.  See
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting

There is an open ticket SOLR-8297
https://issues.apache.org/jira/browse/SOLR-8297 Allow join query over 2
sharded collections: enhance functionality and exception handling



On 22 May 2017 at 16:01, mganeshs  wrote:

> Is there any possibility of supporting joins across multiple shards in near
> future ? How to achieve the join when our data is spread-ed across multiple
> shards. This is very much mandatory when we need to scale out.
>
> Any workarounds if out-of-box possibility is not there ?
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Join-not-working-in-Solr-6-5-tp4336247p4336256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


different length/size of unique 'id' field value in a collection.

2017-05-22 Thread Derek Poh

Hi

Due to the source data structure, I need to concatenate the values of 2 
fields ('supplier_id' and 'product_id') to form the unique 'id' of each 
document.

However there are cases where some documents only have 'supplier_id' field.
This will result in some documents with a longer/larger 'id' field (have 
both 'supplier_id' and 'product_id') and some with a shorter/smaller 
'id' field value (has only 'supplier_id').


Please refer to simplified representation of the records below.
3rd record only has supplier id .
ts1 sup1 pdt1
ts1 sup1 pdt2
ts1 sup2
ts1 sup3 pdt3
ts1 sup4 pdt5
ts1 sup4 pdt6

I understand the unique 'id' is use during indexing to check whether a 
document already exists. Create if it does not exists else update if it 
exists.


Are there any implications if the unique 'id' field value is of 
different size/length among documents of a collection?

Is it advisable to have such design?

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Join not working in Solr 6.5

2017-05-22 Thread mganeshs
Is there any possibility of supporting joins across multiple shards in near
future ? How to achieve the join when our data is spread-ed across multiple
shards. This is very much mandatory when we need to scale out. 

Any workarounds if out-of-box possibility is not there ? 

Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-not-working-in-Solr-6-5-tp4336247p4336256.html
Sent from the Solr - User mailing list archive at Nabble.com.