from:"Bruno Mannina"

RE: [Solr8.7] Performance of group.ngroups ?

2021-01-15 Thread Bruno Mannina

Hello,



I found a temporary solution to my problem.



I do a request without ngroups=true => result is quickly

And just after, I do a simple request with my query and this param:

.={x:"unique(fid)"}

Where the field « fid » is my group field name.



88 sec => 3~4 sec for both requests.



Regards,

Bruno



De : Matheo Software [mailto:i...@matheo-software.com]
Envoyé : jeudi 14 janvier 2021 14:48
À : solr-user@lucene.apache.org
Objet : [Solr8.7] Performance of group.ngroups ?



Hi All,



I have more than 130 million documents, with an index size of more than
400GB on Solr8.7.



I do a simple query and it takes around 1400ms, its ok but when I use
ngroups=true, I get an answer in 88sec.

I know its because Solr calculates the number of groups on a specific field
but is exist a solution to improve that? An alternate solution?



Many thanks,



Cordialement, Best Regards

Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760







<https://www.avast.com/sig-email?utm_medium=email_source=link_campai
gn=sig-email_content=emailclient>

Garanti sans virus.
<https://www.avast.com/sig-email?utm_medium=email_source=link_campai
gn=sig-email_content=emailclient> www.avast.com





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: [Solr8.7] UI request reply empty after 8s

2021-01-14 Thread Bruno Mannina

Hi,

Perfect !
it works when I increase the config.timeout (row 597).

The file app.js can be found here:
/opt/solr/server/solr-webapp/webapp/js/angular


-Message d'origine-
De : ufuk yılmaz [mailto:uyil...@vivaldi.net.INVALID]
Envoyé : mercredi 13 janvier 2021 14:57
À : solr-user@lucene.apache.org
Objet : RE: [Solr8.7] UI request reply empty after 8s

Hi,

A while ago I asked the same thing here. Looking at the source javascript code 
of the frontend app, I saw  a 10k millisecond timeout config in httpInterceptor 
inside app.js. I changed it to something much larger and results of long 
queries began to show.

Hope it helps

Sent from Mail for Windows 10

From: Bruno Mannina
Sent: 13 January 2021 16:39
To: solr-user@lucene.apache.org
Subject: [Solr8.7] UI request reply empty after 8s

Hi All,



I'm facing a problem with my Solr8.7.

When I do a query on my collection from the Solr UI, if the request takes more 
than 8s nothing happens.

I mean, Solr UI answer a blank page. No error in log, no response, nothing at 
all.

And if I do the same request just behind, the answer appears quickly as 
expected.



Is there a UI timeout somewhere?



Thanks,

Bruno





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

[Solr8.7] UI request reply empty after 8s

2021-01-13 Thread Bruno Mannina

Hi All,



I'm facing a problem with my Solr8.7.

When I do a query on my collection from the Solr UI, if the request takes
more than 8s nothing happens.

I mean, Solr UI answer a blank page. No error in log, no response, nothing
at all.

And if I do the same request just behind, the answer appears quickly as
expected.



Is there a UI timeout somewhere?



Thanks,

Bruno





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: [solr8.7] not relevant results for chinese query

2021-01-11 Thread Bruno Mannina

Hi,

With this article ( 
https://opensourceconnections.com/blog/2011/12/23/indexing-chinese-in-solr/ ), 
I begin to understand what happens.

Is someone have already try, with a recent SOLR, the Poading algorithm?


Thanks,
Bruno

-Message d'origine-
De : Bruno Mannina [mailto:bmann...@free.fr]
Envoyé : dimanche 10 janvier 2021 17:57
À : solr-user@lucene.apache.org
Objet : [solr8.7] not relevant results for chinese query

Hello,



I try to use chinese language with my index.



My definition is:











  

   

   

   

   

   

  





But, I get too much not relevant results.



i.e. : With the query (phone case):

tizh:(手機殼)



my query is translate to:

tizh:(手 OR 機 OR 殼)



But:

tizh:(手 AND 機 AND 殼)

returns 0 result.



And:

tizh:”手機殼”

returns also 0 result.



Is it possible to improve my fieldType ? or must I add something else ?



Thanks,

Bruno





--
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

[solr8.7] not relevant results for chinese query

2021-01-10 Thread Bruno Mannina

Hello,

 

I try to use chinese language with my index.

 

My definition is:



 







  

   

   

   

   

   

  



 

But, I get too much not relevant results.

 

i.e. : With the query (phone case):

tizh:(手機殼)

 

my query is translate to:

tizh:(手 OR 機 OR 殼)

 

But:

tizh:(手 AND 機 AND 殼)

returns 0 result.

 

And:

tizh:”手機殼”

returns also 0 result.

 

Is it possible to improve my fieldType ? or must I add something else ?

 

Thanks,

Bruno

 



-- 
L'absence de virus dans ce courrier electronique a ete verifiee par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: [Solr8.7] Chinese ZH language ?

2021-01-10 Thread Bruno Mannina

Yes, it was that, I re-index and it works fine.

Thanks !

-Message d'origine-
De : Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Envoyé : dimanche 10 janvier 2021 16:44
À : solr-user
Objet : Re: [Solr8.7] Chinese ZH language ?

>possible analysis error: cannot change field "tizh" from

You have content indexed against old incompatible definition. Deleted but not 
purged records count.

Delete your index data or change field name during testing.

Regards,
Alex
On Sun., Jan. 10, 2021, 9:19 a.m. Bruno Mannina,  wrote:

> Hello,
>
>
>
> I would like to index simplified chinese ZH language (i.e. 一种新型太阳能坪
> 床增温系统),
>
> I added in my solrconfig the lib:
>
>  dir="${solr.install.dir:../../..}/contrib/analysis-extras/lucene-libs/"
> regex="lucene-analyzers-smartcn-8\.7\.0\.jar" />
>
>
>
> First question: Is it enough ?
>
>
>
> But now I need your help to define the fieldtype “text_zh” in my
> schema.xml to use with:
>
> (PS: As other fields, I need highlight)
>
>
>
>  stored="true" termVectors="true" termPositions="true"
> termOffsets="true"/>
>
>
>
> And
>
>
>
> 
>
> 
>
>  positionIncrementGap="100">
>
>   
>
>
>
>
>
>
>   words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
>
>
>
>
>
>   
>
> 
>
>
>
> No error, when I reload my core.
>
>
>
> But I can’t index Chinese data, I get this error:
>
>
>
> POSTing file CN-0005.xml (application/xml) to [base]
>
> SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
> http:///solr/yyy/update
>
> SimplePostTool: WARNING: Response:  encoding="UTF-8"?>
>
> 
>
>
>
> 
>
>   400
>
>   1
>
> 
>
> 
>
>   
>
> org.apache.solr.common.SolrException
>
>  name="root-error-class">java.lang.IllegalArgumentException
>
>   
>
>   Exception writing document id CN112091782A to the
> index; possible analysis error: cannot change field "tizh" from index
> options=DOCS to inconsistent index
> options=DOCS_AND_FREQS_AND_POSITIONS
>
>   400
>
> 
>
> 
>
> SimplePostTool: WARNING: IOException while reading response:
> java.io.IOException: Server returned HTTP response code: 400 for URL:
> http:///solr/yyy/update
>
>
>
> Thanks a lot for your help,
>
> Bruno
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

[Solr8.7] Chinese ZH language ?

2021-01-10 Thread Bruno Mannina

Hello,



I would like to index simplified chinese ZH language (i.e. 一种新型太阳能坪
床增温系统),

I added in my solrconfig the lib:





First question: Is it enough ?



But now I need your help to define the fieldtype “text_zh” in my
schema.xml to use with:

(PS: As other fields, I need highlight)







And









  

   

   

   

   

   

  





No error, when I reload my core.



But I can’t index Chinese data, I get this error:



POSTing file CN-0005.xml (application/xml) to [base]

SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url:
http:///solr/yyy/update

SimplePostTool: WARNING: Response: 







  400

  1





  

org.apache.solr.common.SolrException

java.lang.IllegalArgumentException

  

  Exception writing document id CN112091782A to the index;
possible analysis error: cannot change field "tizh" from index options=DOCS
to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS

  400





SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 400 for URL:
http:///solr/yyy/update



Thanks a lot for your help,

Bruno





--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: [Solr8.7] Indexing only some language ?

2021-01-10 Thread Bruno Mannina

PErfect ! Thanks !

-Message d'origine-
De : xiefengchang [mailto:fengchang_fi...@163.com]
Envoyé : dimanche 10 janvier 2021 04:50
À : solr-user@lucene.apache.org
Objet : Re:[Solr8.7] Indexing only some language ?

Take a look at the document here:
https://lucene.apache.org/solr/guide/8_7/dynamic-fields.html#dynamic-fields


here's the point: "a field that does not match any explicitly defined fields
can be matched with a dynamic field."


so I guess the priority is quite clear~

















At 2021-01-10 03:38:01, "Bruno Mannina"  wrote:
>Hello,
>
>
>
>I would like to define in my schema.xml some text_xx fields.
>
>I have patent titles in several languages.
>
>Only 6 of them (EN, IT, FR, PT, ES, DE) interest me.
>
>
>
>I know how to define these 6 fields, I use text_en, text_it etc.
>
>
>
>i.e. for English language:
>
>stored="true" termVectors="true" termPositions="true"
>termOffsets="true"/>
>
>
>
>But I have more than 6 languages like: AR, CN, JP, KR etc.
>
>I can't analyze all source files to detect all languages and define
>them in my schema.
>
>
>
>I would like to use a dynamic field to index other languages.
>
>indexed="true" stored="true" omitTermFreqAndPositions="true"
>omitNorms="true"/>
>
>
>
>Is it ok to do that?
>
>Is TIEN field will be indexed twice internally or as tien is already
>defined
>ti* will not process tien?
>
>
>
>Thanks for your kind reply,
>
>
>
>Sincerely
>
>Bruno
>
>
>
>
>
>
>
>
>
>--
>L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
>https://www.avast.com/antivirus


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

[Solr8.7] Indexing only some language ?

2021-01-09 Thread Bruno Mannina

Hello,



I would like to define in my schema.xml some text_xx fields.

I have patent titles in several languages.

Only 6 of them (EN, IT, FR, PT, ES, DE) interest me.



I know how to define these 6 fields, I use text_en, text_it etc.



i.e. for English language:





But I have more than 6 languages like: AR, CN, JP, KR etc.

I can't analyze all source files to detect all languages and define them in
my schema.



I would like to use a dynamic field to index other languages.





Is it ok to do that?

Is TIEN field will be indexed twice internally or as tien is already defined
ti* will not process tien?



Thanks for your kind reply,



Sincerely

Bruno









--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr8.7 Munin ?

2020-11-23 Thread Bruno Mannina

Ok thanks for this help !

-Message d'origine-
De : Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
Envoyé : lundi 23 novembre 2020 10:46
À : solr-user@lucene.apache.org
Objet : Re: Solr8.7 Munin ?

Hi Bruno,

yes, I use munin-solr plugin.
https://github.com/averni/munin-solr

I renamed it to solr_*.py on my servers.

Regards
Bernd


Am 23.11.20 um 09:54 schrieb Bruno Mannina:
> Hello Bernd,
>
> Do you use a specific plugins for Sorl ?
>
> Thanks,
> Bruno
>
> -Message d'origine-
> De : Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Envoyé : lundi 23 novembre 2020 09:02
> À : solr-user@lucene.apache.org
> Objet : Re: Solr8.7 Munin ?
>
> We are using Munin for years now for Solr monitoring.
> Currently Munin 2.0.40 and SolrCloud 6.6.
>
> Regards
> Bernd
>
>
> Am 20.11.20 um 21:02 schrieb Matheo Software:
>> Hello,
>>
>>
>>
>> I would like to use Munin to check my Solr 8.7 but it don't work. I
>> try to configure munin plugins without success.
>>
>>
>>
>> Is somebody use Munin with a recent version of Solr ? (version > 5.4)
>>
>>
>>
>> Thanks a lot,
>>
>>
>>
>> Cordialement, Best Regards
>>
>> Bruno Mannina
>>
>><http://www.matheo-software.com> www.matheo-software.com
>>
>><http://www.patent-pulse.com> www.patent-pulse.com
>>
>> Tél. +33 0 970 738 743
>>
>> Mob. +33 0 634 421 817
>>
>><https://www.facebook.com/PatentPulse> facebook (1)
>> <https://twitter.com/matheosoftware> 1425551717
>> <https://www.linkedin.com/company/matheo-software> 1425551737
>> <https://www.youtube.com/user/MatheoSoftware> 1425551760
>>
>>
>>
>>
>>
>
>

--
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
   https://www.ub.uni-bielefeld.de/~befehl/

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr8.7 Munin ?

2020-11-23 Thread Bruno Mannina

Hello Bernd,

Do you use a specific plugins for Sorl ?

Thanks,
Bruno

-Message d'origine-
De : Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
Envoyé : lundi 23 novembre 2020 09:02
À : solr-user@lucene.apache.org
Objet : Re: Solr8.7 Munin ?

We are using Munin for years now for Solr monitoring.
Currently Munin 2.0.40 and SolrCloud 6.6.

Regards
Bernd


Am 20.11.20 um 21:02 schrieb Matheo Software:
> Hello,
>
>
>
> I would like to use Munin to check my Solr 8.7 but it dont work. I
> try to configure munin plugins without success.
>
>
>
> Is somebody use Munin with a recent version of Solr ? (version > 5.4)
>
>
>
> Thanks a lot,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
>   <http://www.matheo-software.com> www.matheo-software.com
>
>   <http://www.patent-pulse.com> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
>   <https://www.facebook.com/PatentPulse> facebook (1)
> <https://twitter.com/matheosoftware> 1425551717
> <https://www.linkedin.com/company/matheo-software> 1425551737
> <https://www.youtube.com/user/MatheoSoftware> 1425551760
>
>
>
>
>


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Bruno Mannina

Yes ! It works !

I just set SOLR_HEAP="8g" and restart solr.

Thanks a lot !

-Message d'origine-
De : Jörn Franke [mailto:jornfra...@gmail.com]
Envoyé : mercredi 18 novembre 2020 18:50
À : solr-user@lucene.apache.org
Objet : Re: Solr8.7 How to increase JVM-Memory ?

I think it should be /etc/default/solr.in.sh And executable for the user 
executing Solr

> Am 18.11.2020 um 16:44 schrieb Bruno Mannina :
>
> Yes. It was executable.
>
> Must I to create solr.in.sh by copying from solr.in.sh.orig ? it's the right 
> way ?
>
> -Message d'origine-
> De : Jörn Franke [mailto:jornfra...@gmail.com] Envoyé : mercredi 18
> novembre 2020 16:41 À : solr-user@lucene.apache.org Objet : Re:
> Solr8.7 How to increase JVM-Memory ?
>
> Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ?
>
>> Am 18.11.2020 um 16:33 schrieb Matheo Software :
>>
>> 
>> Hi All,
>>
>> Since several years I work with a old version of Solr on Ubuntu, version 5.4.
>> Today I test the 8.7 version.
>> But I’m not able to change the JVM-Memory like in my 5.4 version.
>>
>> Many answers on the web tell to modify the solr.in.sh file but in my case I 
>> have only /opt/solr/solr.in.sh.orig.
>> And if I change the SOLR_HEAP to a new value like “8g” Dashboard shows 
>> always 512MB.
>> I try also to change SOLR_JAVA_MEM without success.
>>
>> Of course, I restart solr each time. (service solr restart). No error in 
>> log. All work fine but no 8g of memory.
>>
>> I also try to copie solr.in.sh.orig to solr.in.sh, the result is always the 
>> same.
>>
>> Could you help me ?
>>
>> Cordialement, Best Regards
>> Bruno Mannina
>> www.matheo-software.com
>> www.patent-pulse.com
>> Tél. +33 0 970 738 743
>> Mob. +33 0 634 421 817
>>
>>
>>
>>Garanti sans virus. www.avast.com
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr8.7 How to increase JVM-Memory ?

2020-11-18 Thread Bruno Mannina

Yes. It was executable.

Must I to create solr.in.sh by copying from solr.in.sh.orig ? it's the right 
way ?

-Message d'origine-
De : Jörn Franke [mailto:jornfra...@gmail.com]
Envoyé : mercredi 18 novembre 2020 16:41
À : solr-user@lucene.apache.org
Objet : Re: Solr8.7 How to increase JVM-Memory ?

Did you make solr.in.sh executable ? Eg chmod a+x solr.in.sh ?

> Am 18.11.2020 um 16:33 schrieb Matheo Software :
>
> 
> Hi All,
>
> Since several years I work with a old version of Solr on Ubuntu, version 5.4.
> Today I test the 8.7 version.
> But I’m not able to change the JVM-Memory like in my 5.4 version.
>
> Many answers on the web tell to modify the solr.in.sh file but in my case I 
> have only /opt/solr/solr.in.sh.orig.
> And if I change the SOLR_HEAP to a new value like “8g” Dashboard shows always 
> 512MB.
> I try also to change SOLR_JAVA_MEM without success.
>
> Of course, I restart solr each time. (service solr restart). No error in log. 
> All work fine but no 8g of memory.
>
> I also try to copie solr.in.sh.orig to solr.in.sh, the result is always the 
> same.
>
> Could you help me ?
>
> Cordialement, Best Regards
> Bruno Mannina
> www.matheo-software.com
> www.patent-pulse.com
> Tél. +33 0 970 738 743
> Mob. +33 0 634 421 817
>
>
>
>   Garanti sans virus. www.avast.com


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina

Hi Toke,

Thanks for sharing this experience, it's very useful for me to have a first 
overview of what will I need.
If I could resume, I will:
- learn about Tika
- Ask a lot of question like the frequency of add/update solr data
- Number of Users
- CPU/RAM/HDD
- A first test with a representative sample

And of course a good expertise :)

Thanks,
Bruno

-Message d'origine-
De : Toke Eskildsen [mailto:t...@kb.dk]
Envoyé : samedi 22 juin 2019 11:36
À : solr_user lucene_apache
Objet : Re: Is Solr can do that ?

Matheo Software Info  wrote:
> My question is very simple ☺ I would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc…) ?

Simple answer: Yes. Assuming 30To means 30 terabyte.

> What is the best way to index this huge data ? several servers ?
> several shards ? other ?

As other participants has mentioned, it is hard to give numbers. What we can do 
is share experience.

We are doing webarchive indexing and I guess there would be quite an overlap 
with your content as we also use Tika. One difference is that the images in a 
webarchive are quite cheap to index, so you'll probably need (relatively) more 
hardware than we use. Very roughly we used 40 CPU-years to index 600 (700? I 
forget) TB of data in one of our runs. Scaling to your 30TB this suggests 
something like 2 CPU-years, or a couple of months for a 16 core machine.

This is just to get a ballpark: You will do yourself a huge favor by building a 
test-setup and process 1 TB or so of your data to get _your_ numbers, before 
you design your indexing setup. It is our experience that the analyzing part 
(Tika) takes much more power than the Solr indexing part: At our last run we 
had 30-40 CPU-cores doing Tika (and related analysis) feeding into a Solr 
running on a 4-core machine on spinning drives.

As for Solr setup for search, then you need to describe in detail what your 
requirements are, before we can give you suggestions. Is the index updated all 
the time, in batches or one-off? How many concurrent users? Are the searches 
interactive or batch-jobs? What kind of aggregations do you need?

In our setup we build separate collections that are merged to single segments 
and never updated. Our use varies between very few interactive users and a lot 
of batch jobs. Scaling this specialized setup to your corpus size would require 
about 3TB of SSD, 64MB RAM and 4 CPU-cores, divided among 4 shards. You are 
likely to need quite a lot more than that, so this is just to say that at this 
scale the use of the index matters _a lot_.

- Toke Eskildsen

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina

Hello Erick,

Well I do not know TIKA, I will of course study it.

Thanks for the info concerning solrj and Tika.

Bruno

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : vendredi 21 juin 2019 19:10
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

What Sam said.

Here’s something to get you started on how and why it’s better to be using Tika 
rather than shipping the docs to Solr and having ExtractingRequestHandler do it 
on Solr: https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

> On Jun 21, 2019, at 9:56 AM, Samuel Kasimalla  wrote:
>
> Hi Bruno,
>
> Assuming you meant 30TB, the first step is to use TIka parser and
> convert the rich documents into plain text.
>
> We need the number of documents, the unofficial word on the street is
> about
> 50 million documents per shard, of course a lot of parameters are
> involved in this, it's a simple question but answer is not so simple :).
>
> Hope this helps.
>
> Thanks
> Sam
> https://www.linkedin.com/in/skasimalla/
>
> On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info <
> i...@matheo-software.com> wrote:
>
>> Dear Solr User,
>>
>>
>>
>> My question is very simple J I would like to know if Solr can process
>> around 30To of data (Pdf, Text, Word, etc…) ?
>>
>>
>>
>> What is the best way to index this huge data ? several servers ?
>> several shards ? other ?
>>
>>
>>
>> Many thanks for your information,
>>
>>
>>
>>
>>
>> Cordialement, Best Regards
>>
>> Bruno Mannina
>>
>> www.matheo-software.com
>>
>> www.patent-pulse.com
>>
>> Tél. +33 0 970 738 743
>>
>> Mob. +33 0 634 421 817
>>
>> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
>> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
>> <https://www.linkedin.com/company/matheo-software>[image: 1425551760]
>> <https://www.youtube.com/user/MatheoSoftware>
>>
>>
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email_source=link
>> _campaign=sig-email_content=emailclient> Garanti sans virus.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email_source=link
>> _campaign=sig-email_content=emailclient>
>> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina

Hello Shawn,

Good news that Solr can do that.

I know that with 30Tb of data, hardware will be the first thing to have.
Concerning Expertise, it's the real problem for me.

First I think I will do several tests before seeing how Solr works with
non-xml document (I have only experience with XML documents)

Thanks,
Bruno

On 6/21/2019 10:32 AM, Matheo Software Info wrote:
> My question is very simple JI would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc.) ?
>
> What is the best way to index this huge data ? several servers ?
> several shards ? other ?

Sure, Solr can do that.  Whether you have enough resources or expertise
available to accomplish it is an entirely different question.

Handling that much data is likely going to require a LOT of expensive
hardware.  The index will almost certainly need to be sharded.  Knowing
exactly what numbers are involved is impossible with the information
available ... and even with more information, it will most likely require
experimentation with your actual data to find an optimal solution.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-don
t-have-a-definitive-answer/

Thanks,
Shawn

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina

Hello Sam,

First, thanks for your answer.

I don't know yet the number of document, I know just that it will be Text, Pdf, 
Word, Xls, etc...
I will try to get more info about the number of document.

I don't know TIka, I will investigate it.

Thanks,
Bruno

-Message d'origine-
De : Samuel Kasimalla [mailto:skasima...@gmail.com]
Envoyé : vendredi 21 juin 2019 18:56
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

Hi Bruno,

Assuming you meant 30TB, the first step is to use TIka parser and convert the 
rich documents into plain text.

We need the number of documents, the unofficial word on the street is about
50 million documents per shard, of course a lot of parameters are involved in 
this, it's a simple question but answer is not so simple :).

Hope this helps.

Thanks
Sam
https://www.linkedin.com/in/skasimalla/

On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info < 
i...@matheo-software.com> wrote:

> Dear Solr User,
>
>
>
> My question is very simple J I would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc…) ?
>
>
>
> What is the best way to index this huge data ? several servers ?
> several shards ? other ?
>
>
>
> Many thanks for your information,
>
>
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
> <https://www.linkedin.com/company/matheo-software>[image: 1425551760]
> <https://www.youtube.com/user/MatheoSoftware>
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_
> campaign=sig-email_content=emailclient> Garanti sans virus.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_
> campaign=sig-email_content=emailclient>
> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Clustering on a Query grouped ?

2019-05-20 Thread Bruno Mannina

Dear Solr Users,



I would like to know if its possible to do a clustering on a query grouped.



In my project, I get only one document by group (because all other documents
from the same group are just equivalents with a different Id)

So I want to do clustering with this result.



Having only one document per group allows me to increase the max number of
different documents used to create cluster.

Each documents have a Family Id. This field is used to create groups.



Thanks a lot for your help.



Cordialement, Best Regards

Bruno Mannina

 <http://www.matheo-software.com/> www.matheo-software.com

 <http://www.patent-pulse.com/> www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Schema.xml, copyField, Slash, ignoreCase ?

2019-01-14 Thread Bruno Mannina

Hi Erick,

Thanks for the tip Admin>>UI>>(core)>>analysis, I will investigate this 
afternoon.

Regards,
Bruno

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com] 
Envoyé : vendredi 11 janvier 2019 17:18
À : solr-user
Objet : Re: Schema.xml, copyField, Slash, ignoreCase ?

The admin UI>>(select a core)>>analysis page is your friend here. It'll show 
you exactly what each filter in your analysis chain does and from there you'll 
need to mix and match filters, your tokenizer and the like to support the 
use-cases you need.

My guess is that the field type you're using contains 
WordDelimiterFilterFactory which is splitting up on the slash.
Similarly for your aribag/airbags problem, probably you have one of the 
stemmers in your analysis chain.

See "Filter Descriptions" in your version of the ref guide.

And one caution: The admin>>core>>analysis chain shows you what happens _after_ 
query parsing. So if you enter (without quotes) "bing bong" those tokens will 
be shown. What fools people is that the query _parser_ gets in there first, so 
they'll then wonder why field:bing bong doesn't work. It's because the parser 
made it into field:bing default_field:bong. So you'll still (potentially) have 
to quote or escape some terms on input, it depends on the query parser you're 
using.

Best,
Erick

On Fri, Jan 11, 2019 at 1:40 AM Bruno Mannina  
wrote:
>
> Hello,
>
>
>
> I’m facing a problem concerning the default field “text” (SOLR 5.4) 
> and queries which contains / (slash)
>
>
>
> I need to have default “text” field with:
>
> - ignoreCase,
>
> - no auto truncation,
>
> - process slash char
>
>
>
> I would like to perform only query on the field “text”
>
> Queries can contain:  code or keywords or both.
>
>
>
> I have 2 fields named symbol and title, and 1 alias ti (old field that 
> I can’t delete or modify)
>
>
>
> * Symbol contains code with slash (i.e A62C21/02)
>
>  required="true" stored="true"/>
>
>
>
> * Title contains English text and also symbol
>
>  stored="true" termVectors="true" termPositions="true" 
> termOffsets="true"/>
>
>
>
> { "symbol": "B65D81/20",
>
> "title": [
>
>  "under vacuum or superatmospheric pressure, or in a special 
> atmosphere, e.g. of inert gas  {(B65D81/28  takes precedence; 
> containers with pressurising means for maintaining ball pressure A63B39/025)} 
> "
>
> ]}
>
>
>
> * Ti is an alias of title
>
>  stored="true" termVectors="true" termPositions="true" 
> termOffsets="true"/>
>
>
>
> * Text is
>
>  multiValued="true"/>
>
>
>
> - Alias are:
>
>
>
> 
>
> 
>
> 
>
> 
>
>
>
>
>
> If I do these queries :
>
>
>
> * ti:airbag   à it’s ok
>
> * title:airbag  à not good for me because it found
> airbags
>
> * ti:b65D81/28  à not good, debug shows ti:b65d81 OR ti:28
>
> * ti:”b65D81/28”  à it’s ok
>
> * symbol:b65D81/28  à it’s ok (even without “ “)
>
>
>
> NOW with “text” field
>
> * b65D81/28  à not good, debug shows text:b65d81 OR
> text:28
>
> * airbag   à it’s ok
>
> * “b65D81/28”  à it’s ok
>
>
>
> It will be great if I can enter symbol without “ “
>
>
>
> Could you help me to have a text field which solve this problem ? 
> (please find below all def of my fields)
>
>
>
> Many thanks for your help.
>
>
>
> String_ci is my own definition
>
>
>
>  sortMissingLast="true" omitNorms="true">
>
> 
>
>   
>
>   
>
> 
>
> 
>
>
>
>  positionIncrementGap="100" multiValued="true">
>
>   
>
> 
>
>  words="stopwords.txt" />
>
> 
>
>   
>
>   
>
> 
>
>  words="stopwords.txt" />
>
>  ignoreCase="true" expand="true"/>
>
> 
>
>   
>
> 
>
>
>
>  positionIncrementGap="100">
>
>   
>
> 
>
>  words="lang/stopwords_en.txt"/>
>
> 
>
> 
>
>  protected="protwords.txt"/>
>
> 
>
>   
>
>   
>
> 
>
>  ignoreCase="true" expand="true"/>
>
>  words="lang/stopwords_en.txt"/>
>
> 
>
> 
>
> protected="protwords.txt"/>
>
> 
>
>   
>
> 
>
>
>
>
>
> Best Regards
>
> Bruno
>
>
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus

RE: Schema.xml, copyField, Slash, ignoreCase ?

2019-01-14 Thread Bruno Mannina

Hi Steve,

Many thanks for this field, I will test it this afternoon in my dev' server.

Thanks also for your explanation !

Have a nice day !

Bruno

-Message d'origine-
De : Steve Rowe [mailto:sar...@gmail.com] 
Envoyé : vendredi 11 janvier 2019 17:43
À : solr-user@lucene.apache.org
Objet : Re: Schema.xml, copyField, Slash, ignoreCase ?

Hi Bruno,

ignoreCase: Looks like you already have achieved this?

auto truncation: This is caused by inclusion of PorterStemFilterFactory in your 
"text_en" field type.  If you don't want its effects (i.e. treating different 
forms of the same word interchangeably), remove the filter.

process slash char: I think you want the slash to be included in symbol terms 
rather than interpreted as a term separator.  One way to achieve this is to 
first, pre-tokenization, convert the slash to a string that does not include a 
term separator, and then post-tokenization, convert the substituted string back 
to a slash.

Here's a version of your text_en that uses PatternReplaceCharFilterFactory[1] 
to convert slashes inside of symbol-ish terms (the pattern is a guess based on 
the symbol text you've provided; you'll likely need to adjust it) to "_": a 
string unlikely to otherwise occur, and which will not be interpreted by 
StandardTokenizer as a term separator; and then PatternReplaceFilterFactory[1] 
to convert "_" back to slashes.  Note that the patterns for the two are 
slightly different, since the *char filter* is given as input the entire field 
text, while the *filter* is given the text of single terms.

-

-

[1] 
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.4.pdf

--
Steve

> On Jan 11, 2019, at 4:18 AM, Bruno Mannina  
> wrote:
> 
> I need to have default “text” field with:
> 
> - ignoreCase,
> 
> - no auto truncation,
> 
> - process slash char
> 
> 
> 
> I would like to perform only query on the field “text”
> 
> Queries can contain:  code or keywords or both.
> 
> 
> 
> I have 2 fields named symbol and title, and 1 alias ti (old field that 
> I can’t delete or modify)
> 
> 
> 
> * Symbol contains code with slash (i.e A62C21/02)
> 
>  required="true" stored="true"/>
> 
> 
> 
> * Title contains English text and also symbol
> 
> stored="true" termVectors="true" termPositions="true" 
> termOffsets="true"/>
> 
> 
> 
> { "symbol": "B65D81/20",
> 
> "title": [
> 
> "under vacuum or superatmospheric pressure, or in a special 
> atmosphere, e.g. of inert gas  {(B65D81/28  takes precedence; 
> containers with pressurising means for maintaining ball pressure A63B39/025)} 
> "
> 
> ]}
> 
> 
> 
> * Ti is an alias of title
> 
> stored="true" termVectors="true" termPositions="true" 
> termOffsets="true"/>
> 
> 
> 
> * Text is
> 
>  multiValued="true"/>
> 
> 
> 
> - Alias are:
> 
> 
> 
>
> 
>
> 
>
> 
>
> 
> 
> 
> 
> 
> If I do these queries :
> 
> 
> 
> * ti:airbag   à it’s ok
> 
> * title:airbag  à not good for me because it found
> airbags
> 
> * ti:b65D81/28  à not good, debug shows ti:b65d81 OR ti:28
> 
> * ti:”b65D81/28”  à it’s ok
> 
> * symbol:b65D81/28  à it’s ok (even without “ “)
> 
> 
> 
> NOW with “text” field
> 
> * b65D81/28  à not good, debug shows text:b65d81 OR
> text:28
> 
> * airbag   à it’s ok
> 
> * “b65D81/28”  à it’s ok
> 
> 
> 
> It will be great if I can enter symbol without “ “
> 
> 
> 
> Could you help me to have a text field which solve this problem ? 
> (please find below all def of my fields)
> 
> 
> 
> Many thanks for your help.
> 
> 
> 
> String_ci is my own definition
> 
> 
> 
> sortMissingLast="true" omitNorms="true">
> 
>
> 
>  
> 
>  
> 
>
> 
>
> 
> 
> 
> positionIncrementGap="100" multiValued="true">
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
>
> 
>  
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
> ignoreCase="true" expand="true"/>
> 
>
> 
>  
> 
>
> 
> 
> 
> positionIncrementGap="100">
> 
>  
> 
>
> 
> words="lang/stopwords_en.txt"/>
> 
>
> 
>
> 
> protected="protwords.txt"/>
> 
>
> 
>  
> 
>  
> 
>
> 
> ignoreCase="true" expand="true"/>
> 
> words="lang/stopwords_en.txt"/>
> 
>
> 
>
> 
>protected="protwords.txt"/>
> 
>
> 
>  
> 
>
> 
> 
> 
> 
> 
> Best Regards
> 
> Bruno
> 
> 
> 
> 
> 
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus

Schema.xml, copyField, Slash, ignoreCase ?

2019-01-11 Thread Bruno Mannina

Hello,



Im facing a problem concerning the default field text (SOLR 5.4) and
queries which contains / (slash)



I need to have default text field with:

- ignoreCase,

- no auto truncation,

- process slash char



I would like to perform only query on the field text

Queries can contain:  code or keywords or both.



I have 2 fields named symbol and title, and 1 alias ti (old field that I
cant delete or modify)



* Symbol contains code with slash (i.e A62C21/02)





* Title contains English text and also symbol





{ "symbol": "B65D81/20",

"title": [

 "under vacuum or superatmospheric pressure, or in a special atmosphere,
e.g. of inert gas  {(B65D81/28  takes precedence; containers with
pressurising means for maintaining ball pressure A63B39/025)} "

]}



* Ti is an alias of title





* Text is





- Alias are:















If I do these queries :



* ti:airbag   à its ok

* title:airbag  à not good for me because it found
airbags

* ti:b65D81/28  à not good, debug shows ti:b65d81 OR ti:28

* ti:b65D81/28  à its ok

* symbol:b65D81/28  à its ok (even without  )



NOW with text field

* b65D81/28  à not good, debug shows text:b65d81 OR
text:28

* airbag   à its ok

* b65D81/28  à its ok



It will be great if I can enter symbol without  



Could you help me to have a text field which solve this problem ? (please
find below all def of my fields)



Many thanks for your help.



String_ci is my own definition







  

  









  







  

  









  







  













  

  











   



  







Best Regards

Bruno





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Nested Documents without using "type" field ? Possible or Not ?

2018-12-06 Thread Bruno Mannina

Dear All,

Nobody is able to tell me if this structure can be querying with whole parents ?

Sorry for this second message,

Sincerely,
Bruno

-Message d'origine-
De : Bruno Mannina [mailto:bmann...@matheo-software.com] 
Envoyé : mercredi 5 décembre 2018 11:33
À : solr-user@lucene.apache.org
Objet : Nested Documents without using "type" field ? Possible or Not ?

Hello,

 

I would like to use SOLR to index the Cooperative Patent Classification,

The CPC has a hierarchical structure and it can have more than 20 level.

It's a basic structure without Type of nested doc.

i.e: 

A -> A01 -> A01B -> A01B3/00 -> A01B3/40 -> A01B3/4025 .

A -> A01 -> A01L -> A01L1/00 -> A01L1/012 -> A01L1/0125 .

B -> B05 -> B05C -> B05C10/00 -> B05C10/87

 

Important: Each "Code" has a Definition (free text used to explain the code).

A record is: Code + Definition

 

- I already indexed this CPC structure with a XML format it works fine.

- With this kind of structure I can't set a type of nested doc.

- A keyword that a user search can be found  in several levels (parent and
child)

 

So, my tests:

If I set a field named "typedoc" with "parentDoc" or "leaf", 

I'm facing an error when I used ParentFilter, ChildFilter, etc. 

"Child query must only match non-parent docs"

q={!parent which="typedoc:parentDoc"}ti:details

fq=*,[child parentFilter="typedoc:parentDoc" childFilter="ti:details"]

 

I need to have the whole structure when I do a query. I mean, I need to have 
all parents until level=1

 

My question is:

 

Is someone have already indexed and used this kind of structure ?

All information I found use typedoc field.

 

Thanks for your help !

 

Cordialement, Best Regards

Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Query regarding Dynamic Fields

2018-12-05 Thread Bruno Mannina

Hi Jay,

In my case, I created a CopyField for this case.

i.e.


And of course define ABC before




-Message d'origine-
De : jay harkhani [mailto:jay.harkh...@hotmail.com]
Envoyé : mercredi 5 décembre 2018 13:29
À : solr-user@lucene.apache.org
Objet : Query regarding Dynamic Fields

Hello All,


We are using dynamic fields in our collection. We want to use it in query to
fetch records. Can someone please advice on it?

i.e.: q=ABC_*:"myValue"

Here "ABC_*" is dynamic field. Currently when we tried if provide field name
as above it gives "org.apache.solr.search.SyntaxError". It only returns data
when provide actual field name.

Thank you for any help you can offer.

Regards,
Jay Harkhani.


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Nested Documents without using "type" field ? Possible or Not ?

2018-12-05 Thread Bruno Mannina

Hello,



I would like to use SOLR to index the Cooperative Patent Classification,

The CPC has a hierarchical structure and it can have more than 20 level.

It's a basic structure without Type of nested doc.

i.e:

A -> A01 -> A01B -> A01B3/00 -> A01B3/40 -> A01B3/4025 .

A -> A01 -> A01L -> A01L1/00 -> A01L1/012 -> A01L1/0125 .

B -> B05 -> B05C -> B05C10/00 -> B05C10/87



Important: Each "Code" has a Definition (free text used to explain the
code).

A record is: Code + Definition



- I already indexed this CPC structure with a XML format it works fine.

- With this kind of structure I can't set a type of nested doc.

- A keyword that a user search can be found  in several levels (parent and
child)



So, my tests:

If I set a field named "typedoc" with "parentDoc" or "leaf",

I'm facing an error when I used ParentFilter, ChildFilter, etc.

"Child query must only match non-parent docs"

q={!parent which="typedoc:parentDoc"}ti:details

fq=*,[child parentFilter="typedoc:parentDoc" childFilter="ti:details"]



I need to have the whole structure when I do a query. I mean, I need to have
all parents until level=1



My question is:



Is someone have already indexed and used this kind of structure ?

All information I found use typedoc field.



Thanks for your help !



Cordialement, Best Regards

Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Solr5.4: Trouble to use Xml Nested Documents

2018-11-29 Thread Bruno Mannina

Dear Solr Users,



I would like to use solr to do searches on a hierarchical structure (for
knowers CPC and IPC).

Constraint: I use SOLR5.4 and I can't upgrade it.



I succeed to import my xml source file (see at the end of this post for the
xml file)

I succeed to search into.



But I always get the matched items, and I would like to have also "ALL"
parents.

If I delete the item level=4 then I can use this q fl

q={!parent which="c_type:class"}ti:details

f=*, [child parentFilter="c_type=class" childFilter="ti:details"]



BUT, I get only 1 parent level !?

I found id=A01B and id=A01 but not id=A ???!!!



Now if I add the item level=4 and with the same q and fl

Then, I have an error:

Child query must only match non-parent docs

I read all lot of post concerning this problem but not a solution.



Do you think it's possible to do this kind of request to get matched docs
AND all upper-level parents. (i.e A A01 A01B)



Many thanks for your help,



Regards

Bruno







A

class

A

1

Human Necessity



   A01

   class

   A01

   2

   Agriculture; Forestry;
Animal Husbandry; Hunting; Trapping

   

   A01B

   class

   A01B

   3


   Soil
working in agriculture of forestry; parts, details, or accessoiries of
agricultural machines or implements, in general.

   

   A01B1/00

   leaf

   A01B1/00

   4

   Hands tools edge trimmers for lawns.

   

   

   

   A01C

   leaf

   A01C

   3


   Planting;Sowing;Fertilising combined with general working of
soil; parts.

   







---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Bruno Mannina

Hi Erick,

I want to index this file because I received this file from my boss.

This file contains around 1.5M docs.

I think I will split this file and index them. 
It will be better.

Thanks

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com] 
Envoyé : mercredi 30 mai 2018 16:50
À : solr-user
Objet : Re: Solr5.4 - Indexing a big file (size = 2.4Go)

Why do you want to index a 2G file in the first place? You can't really do 
anything with it.

If you deliver it to a browser, the browser will churn forever. If you try to 
export it it'll suck up your bandwidth terribly.

If it's a bunch of individual docs (in Solr's xml format) about the only thing 
that makes sense is to break it up.

This sounds like an XY problem, you've asked how to do X (index a 2G
file) without telling us Y (what
the use-case is).

Best,
Erick

On Wed, May 30, 2018 at 7:18 AM, Bruno Mannina  
wrote:
> Dear Solr User,
>
>
>
> I got a invalid content length when I try to index my file (xml file 
> with a size of 2.4Go)
>
>
>
> I use simpleposttool like in the documentation on my ubuntu server.
>
>>bin/post -port 1234 -c mycollection /home/bruno/2013.xml
>
>
>
> It works with smaller file but not with this one. I suppose it's the size.
>
>
>
> Is exist a param to change to allow big file ?
>
>
>
> I change in the solrconfig the param formdatauploadlimitinkb to 4096 
> and miltipartuploadlimitinkb to 4096000 without successing.
>
>
>
> Do you have an idea ?
>
>
>
> Many thanks for your help,
>
>
>
> Best Regards
>
> Bruno
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus

Solr5.4 - Indexing a big file (size = 2.4Go)

2018-05-30 Thread Bruno Mannina

Dear Solr User,



I got a invalid content length when I try to index my file (xml file with a
size of 2.4Go)



I use simpleposttool like in the documentation on my ubuntu server.

>bin/post -port 1234 -c mycollection /home/bruno/2013.xml



It works with smaller file but not with this one. I suppose it's the size.



Is exist a param to change to allow big file ?



I change in the solrconfig the param formdatauploadlimitinkb to 4096 and
miltipartuploadlimitinkb to 4096000 without successing.



Do you have an idea ?



Many thanks for your help,



Best Regards

Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Wrong ngroups value result ?

2018-03-27 Thread Bruno Mannina

Dear Solr User,



I have several collections on a SOLR 5.4 (Ubuntu server).

Each collection have a same unique key "id".

Collections have sometimes common records.



Actually, I would like to use group option but the ngroups result is wrong.

SOLR find the right number of item but do the sum of each collection for
ngroups value.



I know this is a known problem but is exist a solution of this problem ?



I'm looking for since several hours without success.



Thanks for your help,

Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-06 Thread Bruno Mannina

Hi Erik,

Sorry for the late reply, I wasn't in my office this week...

So, I give more information:

* IC is a multi-value field defined like this:


* The request I use (i.e):
http://my_host/solr/collection/select?
q=ic:(A63C10* OR G06F22/086)
=0
=10
=json
=true
=pd+desc
=*
// HighLight
=true
=ti,ab,ic,inc,cpc,apc
=
=
=colored
=true
=true
=true
=999
=true

* Result:
I have only one color (in my case the yellow) for all different values found

* BUT *

If I use a non multi-value field like ti (title) with a query with some keywords


*Result (i.e ti:(foo OR merge) ):
I have different colors for each different terms found


Question:
- Is it because IC field is not defined with all term*="true" options ?
- How can I have different color and not use pre and post tags ?


Many thanks for your help !

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : mercredi 4 octobre 2017 15:48
À : solr-user
Objet : Re: Solr 5.4.0: Colored Highlight and multi-value field ?

How does it not work for you? Details matter, an example set of values and the 
response from Solr are good bits of info for us to have.

On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina <bmann...@matheo-software.com>
wrote:

> Dear all,
>
>
>
> Is it possible to have a colored highlight in a multi-value field ?
>
>
>
> I’m succeed to do it on a textfield but not in a multi-value field,
> then SOLR takes hl.simple.pre / hl.simple.post as tag.
>
>
>
> Thanks a lot for your help,
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
> <https://www.linkedin.com/company/matheo-software>[image: 1425551760]
> <https://www.youtube.com/user/MatheoSoftware>
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_
> campaign=sig-email_content=emailclient> Garanti sans virus.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_
> campaign=sig-email_content=emailclient>
> <#m_-7780043212915396992_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-03 Thread Bruno Mannina

Dear all,



Is it possible to have a colored highlight in a multi-value field ?



Im succeed to do it on a textfield but not in a multi-value field, then
SOLR takes hl.simple.pre / hl.simple.post as tag.



Thanks a lot for your help,



Cordialement, Best Regards

Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 970 738 743

Mob. +33 0 634 421 817

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Shards, delete duplicates ?

2017-04-14 Thread Bruno Mannina

Dear Solr users,



I have two collections C1 and C2

For C1 and C2 the unique key is ID.



ID in C1 are patent numbers normalized i.e US + 12 digits + A1

ID in C2 are patent numbers as I receive them. US + 13 digits + A1 (a
leading 0 is added)



My collection C2 has a field name ID12 which is not defined as a unique
field.

This ID12 is the copy of the field ID of C1. (US + 12 digits + A1)

Data in ID12 are unique in the whole C2 collection.



Data in C1_ID and C2_ID12 are the same.



I try to request these both collections using shards in the url.

It works fine but I get duplicate documents. Its normal I know.



Is exists a method, a parameter, or anything else that allows me to indicate

to  solr to compare ID in C1 with ID12 in C2 to delete duplicates ?



Many thanks for your help,





Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Solr5, Clustering & exact phrase problem

2017-03-13 Thread Bruno Mannina

Dear Solr-User,



Im trying to use solr clustering (Lingo algorithm) on my database (notices
with id, title, abstract fields)



All works fine when my query is simple (with or without Boolean operators)
but if I try with exact phrase like:

..=ti:snowboard binding&



Then Solr generates only one cluster named other and put inside all
notices.



As I test it since few times, I have in my solrconfig the sample that
example gives.

Of course, I changed field names.



Do you know if I made a mistake, missing something or may be exact phrase is
not supported by clustering ?



Just one another question, I want to generate clusters by using fields
abstract and title, is exact what I did ion my solrconfig:

Carrot.title = title

Carrot.snippet = abstract



Thanks a lot for your help,



Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728

 <https://www.facebook.com/PatentPulse> facebook (1)
<https://twitter.com/matheosoftware> 1425551717
<https://www.linkedin.com/company/matheo-software> 1425551737
<https://www.youtube.com/user/MatheoSoftware> 1425551760





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina

Hello all,



Im facing a problem that I would like to know if its possible to do it
with one request in SOLR.

I have SOLR 5.



I have docs with several fields but here two are useful for us.

Field 1 : id (unique key)

Field 2 : fid (family Id)



i.e:



id:XXX

fid: 1254



id: YYY

fid: 1254



id: ZZZ

fid:3698



id: QQQ

fid: 3698





I request only by id in my project, and I would like in my result have also
all docs that have the same fid .

i.e. if I request :

..q=id:ZZZ&



I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid



MoreLikeThis, Group, etc dont answer to my question (but may I dont know
how to use it to do that)



Thanks for your help,



Bruno





Bruno Mannina

 <http://www.matheo-software.com> www.matheo-software.com

 <http://www.patent-pulse.com> www.patent-pulse.com

Tél. +33 0 430 650 788
Fax. +33 0 430 650 728



Stay in touch!

 <https://twitter.com/matheosoftware> cid:image001.png@01D2860B.70B15DC0
<https://www.linkedin.com/company/matheo-software>
cid:image002.png@01D2860B.70B15DC0
<https://www.youtube.com/user/MatheoSoftware>
cid:image003.png@01D2860B.70B15DC0





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina

Ok Alex, I will looking for a best solution. I'm afraid to have a OOM with a 
huge number of ids.

And yes I already use a POST query, it was just to show my problem. Anyway 
thanks to indicate me this information also.

-Message d'origine-
De : Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Envoyé : jeudi 23 février 2017 00:08
À : solr-user
Objet : Re: Get docs with same value in one other field ?

A thousand of IDs could be painful to send and perhaps to run against.

At minimum, look into splitting your query into multiple variables (so you 
could reuse the list in both direct and join query). Look also at using terms 
query processor that specializes in the list of IDs. You may also need to send 
your ID list as a POST, not GET request to avoid blowing the URL length.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 22 February 2017 at 17:55, Bruno Mannina <bmann...@free.fr> wrote:
> Just a little more thing, I need to request up to 1000 id's Actually I
> test with 2 or 3 and it takes times (my db is around 100 000 000 docs, 128Go 
> RAM).
>
> Do you think, it could be OOM error ? if I test with up to 1000 id ?
>
> -Message d'origine-
> De : Bruno Mannina [mailto:bmann...@free.fr] Envoyé : mercredi 22
> février 2017 23:47 À : solr-user@lucene.apache.org Objet : RE: Get
> docs with same value in one other field ?
>
> Ye it's perfect !!! it works.
>
> Thanks David & Alexandre !
>
> -Message d'origine-
> De : David Hastings [mailto:hastings.recurs...@gmail.com]
> Envoyé : mercredi 22 février 2017 23:00 À :
> solr-user@lucene.apache.org Objet : Re: Get docs with same value in
> one other field ?
>
> sorry embedded link:
>
> q={!join+from=fid=fid}id:ZZZ
>
> On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < 
> hastings.recurs...@gmail.com> wrote:
>
>> for a reference to some examples:
>>
>> https://wiki.apache.org/solr/Join
>>
>> sor youd want something like:
>>
>> q={!join+from=fid=fid}i
>> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7D
>> i
>> pod>
>> d:ZZZ
>>
>> i dont have much experience with this function however
>>
>>
>>
>> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch
>> <arafa...@gmail.com
>> > wrote:
>>
>>> Sounds like two clauses with the second clause being a JOINT search
>>> where you match by ID and then join on FID.
>>>
>>> Would that work?
>>>
>>> Regards,
>>>Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>>> experienced
>>>
>>>
>>> On 22 February 2017 at 16:27, Bruno Mannina <bmann...@free.fr> wrote:
>>> >
>>> >
>>> > Hello all,
>>> >
>>> >
>>> >
>>> > I'm facing a problem that I would like to know if it's possible to
>>> > do it with one request in SOLR.
>>> >
>>> > I have SOLR 5.
>>> >
>>> >
>>> >
>>> > I have docs with several fields but here two are useful for us.
>>> >
>>> > Field 1 : id (unique key)
>>> >
>>> > Field 2 : fid (family Id)
>>> >
>>> >
>>> >
>>> > i.e:
>>> >
>>> >
>>> >
>>> > id:XXX
>>> >
>>> > fid: 1254
>>> >
>>> >
>>> >
>>> > id: YYY
>>> >
>>> > fid: 1254
>>> >
>>> >
>>> >
>>> > id: ZZZ
>>> >
>>> > fid:3698
>>> >
>>> >
>>> >
>>> > id: QQQ
>>> >
>>> > fid: 3698
>>> >
>>> > .
>>> >
>>> >
>>> >
>>> > I request only by id in my project, and I would like in my result
>>> > have
>>> also
>>> > all docs that have the same fid .
>>> >
>>> > i.e. if I request :
>>> >
>>> > ..q=id:ZZZ&.
>>> >
>>> >
>>> >
>>> > I get the docs ZZZ of course but also QQQ because QQQ_fid =
>>> > ZZZ_fid
>>> >
>>> >
>>> >
>>> > MoreLikeThis, Group, etc. don't answer to my question (but may I
>>> > don't
>>> know
>>> > how to use it to do that)
>>> >
>>> >
>>> >
>>> > Thanks for your help,
>>> >
>>> >
>>> >
>>> > Bruno
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > ---
>>> > L'absence de virus dans ce courrier électronique a été vérifiée
>>> > par le
>>> logiciel antivirus Avast.
>>> > https://www.avast.com/antivirus
>>>
>>
>>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina

Just a little more thing, I need to request up to 1000 id's
Actually I test with 2 or 3 and it takes times (my db is around 100 000 000 
docs, 128Go RAM).

Do you think, it could be OOM error ? if I test with up to 1000 id ?

-Message d'origine-
De : Bruno Mannina [mailto:bmann...@free.fr] 
Envoyé : mercredi 22 février 2017 23:47
À : solr-user@lucene.apache.org
Objet : RE: Get docs with same value in one other field ?

Ye it's perfect !!! it works.

Thanks David & Alexandre !

-Message d'origine-
De : David Hastings [mailto:hastings.recurs...@gmail.com]
Envoyé : mercredi 22 février 2017 23:00
À : solr-user@lucene.apache.org
Objet : Re: Get docs with same value in one other field ?

sorry embedded link:

q={!join+from=fid=fid}id:ZZZ

On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < hastings.recurs...@gmail.com> 
wrote:

> for a reference to some examples:
>
> https://wiki.apache.org/solr/Join
>
> sor youd want something like:
>
> q={!join+from=fid=fid}i
> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7Di
> pod>
> d:ZZZ
>
> i dont have much experience with this function however
>
>
>
> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch 
> <arafa...@gmail.com
> > wrote:
>
>> Sounds like two clauses with the second clause being a JOINT search 
>> where you match by ID and then join on FID.
>>
>> Would that work?
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and 
>> experienced
>>
>>
>> On 22 February 2017 at 16:27, Bruno Mannina <bmann...@free.fr> wrote:
>> >
>> >
>> > Hello all,
>> >
>> >
>> >
>> > I'm facing a problem that I would like to know if it's possible to 
>> > do it with one request in SOLR.
>> >
>> > I have SOLR 5.
>> >
>> >
>> >
>> > I have docs with several fields but here two are useful for us.
>> >
>> > Field 1 : id (unique key)
>> >
>> > Field 2 : fid (family Id)
>> >
>> >
>> >
>> > i.e:
>> >
>> >
>> >
>> > id:XXX
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: YYY
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: ZZZ
>> >
>> > fid:3698
>> >
>> >
>> >
>> > id: QQQ
>> >
>> > fid: 3698
>> >
>> > .
>> >
>> >
>> >
>> > I request only by id in my project, and I would like in my result 
>> > have
>> also
>> > all docs that have the same fid .
>> >
>> > i.e. if I request :
>> >
>> > ..q=id:ZZZ&.
>> >
>> >
>> >
>> > I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid
>> >
>> >
>> >
>> > MoreLikeThis, Group, etc. don't answer to my question (but may I 
>> > don't
>> know
>> > how to use it to do that)
>> >
>> >
>> >
>> > Thanks for your help,
>> >
>> >
>> >
>> > Bruno
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---
>> > L'absence de virus dans ce courrier électronique a été vérifiée par 
>> > le
>> logiciel antivirus Avast.
>> > https://www.avast.com/antivirus
>>
>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

RE: Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina

Ye it's perfect !!! it works.

Thanks David & Alexandre !

-Message d'origine-
De : David Hastings [mailto:hastings.recurs...@gmail.com]
Envoyé : mercredi 22 février 2017 23:00
À : solr-user@lucene.apache.org
Objet : Re: Get docs with same value in one other field ?

sorry embedded link:

q={!join+from=fid=fid}id:ZZZ

On Wed, Feb 22, 2017 at 4:58 PM, David Hastings < hastings.recurs...@gmail.com> 
wrote:

> for a reference to some examples:
>
> https://wiki.apache.org/solr/Join
>
> sor youd want something like:
>
> q={!join+from=fid=fid}i
> <http://localhost:8983/solr/select?q=%7B!join+from=manu_id_s+to=id%7Di
> pod>
> d:ZZZ
>
> i dont have much experience with this function however
>
>
>
> On Wed, Feb 22, 2017 at 4:40 PM, Alexandre Rafalovitch
> <arafa...@gmail.com
> > wrote:
>
>> Sounds like two clauses with the second clause being a JOINT search
>> where you match by ID and then join on FID.
>>
>> Would that work?
>>
>> Regards,
>>Alex.
>> ----
>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>
>>
>> On 22 February 2017 at 16:27, Bruno Mannina <bmann...@free.fr> wrote:
>> >
>> >
>> > Hello all,
>> >
>> >
>> >
>> > I'm facing a problem that I would like to know if it's possible to
>> > do it with one request in SOLR.
>> >
>> > I have SOLR 5.
>> >
>> >
>> >
>> > I have docs with several fields but here two are useful for us.
>> >
>> > Field 1 : id (unique key)
>> >
>> > Field 2 : fid (family Id)
>> >
>> >
>> >
>> > i.e:
>> >
>> >
>> >
>> > id:XXX
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: YYY
>> >
>> > fid: 1254
>> >
>> >
>> >
>> > id: ZZZ
>> >
>> > fid:3698
>> >
>> >
>> >
>> > id: QQQ
>> >
>> > fid: 3698
>> >
>> > .
>> >
>> >
>> >
>> > I request only by id in my project, and I would like in my result
>> > have
>> also
>> > all docs that have the same fid .
>> >
>> > i.e. if I request :
>> >
>> > ..q=id:ZZZ&.
>> >
>> >
>> >
>> > I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid
>> >
>> >
>> >
>> > MoreLikeThis, Group, etc. don't answer to my question (but may I
>> > don't
>> know
>> > how to use it to do that)
>> >
>> >
>> >
>> > Thanks for your help,
>> >
>> >
>> >
>> > Bruno
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ---
>> > L'absence de virus dans ce courrier électronique a été vérifiée par
>> > le
>> logiciel antivirus Avast.
>> > https://www.avast.com/antivirus
>>
>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Get docs with same value in one other field ?

2017-02-22 Thread Bruno Mannina



Hello all,



I'm facing a problem that I would like to know if it's possible to do it
with one request in SOLR.

I have SOLR 5.



I have docs with several fields but here two are useful for us.

Field 1 : id (unique key)

Field 2 : fid (family Id)



i.e:



id:XXX

fid: 1254



id: YYY

fid: 1254



id: ZZZ

fid:3698



id: QQQ

fid: 3698

.



I request only by id in my project, and I would like in my result have also
all docs that have the same fid .

i.e. if I request :

..q=id:ZZZ&.



I get the docs ZZZ of course but also QQQ because QQQ_fid = ZZZ_fid



MoreLikeThis, Group, etc. don't answer to my question (but may I don't know
how to use it to do that)



Thanks for your help,



Bruno







---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Re: Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina


Le 09/09/2016 à 17:57, Shawn Heisey a écrit :

On 9/8/2016 9:41 AM, Bruno Mannina wrote:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA
to 2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

This isn't a Solr issue, which is easy to determine by the fact that
you've stopped Solr and it's not even running.  It's a problem with the
filesystem, probably the destination filesystem.

The most common reason that I have found for this error is a destination
filesystem that is incapable of holding a large file -- which can happen
when the disk is formatted fat32 instead of ntfs or a Linux filesystem.
You can have a 2TB filesystem with fat32, but no files larger than 4GB
-- so your 65GB file won't fit.

I think you're going to need to reformat that external drive with
another filesystem.  If you choose NTFS, you'll be able to use the disk
on either Linux or Windows.

Thanks,
Shawn



Hi Shawn,

First thanks for your answer, effectively it's a little bit clear.
Tonight I will check the file system of my hdd.

And sorry for this question out of solr subject.

Cdlt,
Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina


Dear Solr Users,

I use since several years SOLR and since two weeks, I have a problem
when I try to copy my solr index.

My solr index is around 180Go (~100 000 000 docs, 1 doc ~ 3ko)

My method to save my index every Sunday:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA to
2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

I tried also on windows (I have a dualboot), I have "redondance error".

I check my HDD, no error, I check the file "_k46.fdt" no error, I can
delete docs, add docs, my database can be reach and works fine.

Is someone have an idea to backup my database ? or why I have this error ?

Many thanks for your help,

Sincerely,

Bruno





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Re: How to index text field with html entities ?

2016-07-30 Thread Bruno Mannina


Thanks Shawn for these precisions

Le 30/07/2016 à 00:43, Shawn Heisey a écrit :

On 7/29/2016 4:05 PM, Bruno Mannina wrote:

after checking my log it seems that it concerns only some html entities.
No problem with  but I have problem with:



etc...

Those are valid *HTML* entities, but they are not valid *XML* entities.
The list of entities that are valid in XML is quite short -- there are
only five of them.

https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

When Solr processes XML, it is only going to convert entities that are
valid for XML -- the five already mentioned.  It will fail on the other
247 entities that are only valid for HTML.

If you are seeing the problem with  (which is one of the five valid
XML entities) then we'll need the Solr version and the full error
message/stacktrace from the solr logfile.

Thanks,
Shawn





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Re: How to index text field with html entities ?

2016-07-29 Thread Bruno Mannina


Hi Chris,

Thanks for your answer, and I add a little thing,

after checking my log it seems that it concerns only some html entities.
No problem with  but I have problem with:



etc...

I will check your answer to find a solution,

Thanks !

Le 29/07/2016 à 23:58, Chris Hostetter a écrit :

: I have several xml files that contains html entities in some fields.

...

: If I set my field like this:
:
: Brown  Gammon
:
: Solr generates error "Undeclared general entity"

...because that's not valid XML...

: if I add CDATA like this:
:
: 
:
: it seems that I can't search with the &

...because that is valid xml, and tells solr you want the literal string
"Brown  Gammon" to be indexed -- given a typical analyzer you are
probably getting either "" or "amp" as a term in your index.

: Could you help me to find the right syntax ?

the client code you are using for indexing can either "parse" these HTML
snippets using an HTML parser, and then send solr the *real* string you
want to index, or you can configure solr with something like
HTMLStripFieldUpdateProcessorFactory (if you want both the indexed form
and the stored form to be plain text) or HTMLStripCharFilterFactory (if
you wnat to preserve the html markup in the stored value, but strip it as
part of the analysis chain for indexing.


http://lucene.apache.org/solr/6_1_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
http://lucene.apache.org/core/6_1_0/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.html


-Hoss
http://www.lucidworks.com/




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

How to index text field with html entities ?

2016-07-29 Thread Bruno Mannina


Dear Solr User,

Solr 5.0.1

I have several xml files that contains html entities in some fields.

I have a author field (english text) with this kind of text:

Brown  Gammon

If I set my field like this:

Brown  Gammon

Solr generates error "Undeclared general entity"

if I add CDATA like this:



it seems that I can't search with the &

au:"brown & gammon"

Could you help me to find the right syntax ?

Thanks a lot,

Bruno




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


yes id value is unique in C1 and unique in C2.
id in C1 is never present in C2
id in C2 is never present in C1

Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:


Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com

--

Regards,
Binoy Dalal




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query 
fid:34520196


http://xxx.xxx.xxx.xxx:/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ 
"fl":"fid,cc*,st", "indent":"true", "q":"fid:34520196", 
"collection":"c1,c2", "wt":"json"}}, 
"response":{"numFound":1,"start":0,"docs":[ {


"id":"EP1680447",
"st":"LAPSED",
"fid":"34520196"}]
  }
}


http://xxx.xxx.xxx.xxx:/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "fl":"id,fid,cc*,st",
  "indent":"true",
  "q":"fid:34520196",
  "collection":"c1,c2",
  "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"WO2005040212",
"st":"PENDING",
"cc_CA":"LAPSED",
"cc_EP":"LAPSED",
"cc_JP":"PENDING",
"cc_US":"LAPSED",
"fid":"34520196"}]
  }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno

Le 06/01/2016 14:56, Emir Arnautovic a écrit :

Hi Bruno,
Can you check counts? Is it possible that first page is only with 
results from collection that you sent request to so you assumed it 
returns only results from single collection?


Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned 
results
from two identical collections. I doubt if it is broken in 5.4 just 
double

check if you are not missing anything else.

Thanks,
Susheel

http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2 



responseHeader": {"status": 0,"QTime": 98,"params": {"q": 
"id_type:hello","

indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id": 
"3","id_type": "

hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr> wrote:


yes id value is unique in C1 and unique in C2.
id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :


Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:

Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the 
same

schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com

--


Regards,
Binoy Dalal



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com







---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not =c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here : 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina <bmann...@free.fr> a écrit :

Same result on my dev' server, it seems that collection param haven't effect on 
the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select handler, 
is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr> wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ "fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", "wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
 "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr> wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc,

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


:( not work for me

http://my_adress:my_port/solr/c1/select?q={!join from=fid to=fid 
fromIndex=c2}fid:34520196=json

the result is always the same, it answer only for c1
34520196 has result in both collections



Le 06/01/2016 18:16, Binoy Dalal a écrit :

Bruno,
Use join like so:
{!join from=f1 to=f2 fromIndex=c2}
On c1

On Wed, 6 Jan 2016, 22:30 Bruno Mannina <bmann...@free.fr> wrote:


Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not =c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here :

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina <bmann...@free.fr> a écrit :

Same result on my dev' server, it seems that collection param haven't

effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select

handler, is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr>

wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same

query

fid:34520196

http://xxx.xxx.xxx.xxx:


/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{

"fl":"fid,cc*,st",

"indent":"true", "q":"fid:34520196", "collection":"c1,c2",

"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:


/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with

results

from collection that you sent request to so you assumed it returns

only

results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel




http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id":

"1","

id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr>

wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the

cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr>

wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with

the

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr> wrote:


Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{ "fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", "wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
 "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:


Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": "1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr> wrote:

yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?

To get proper results, the ids should be unique across both the cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each collection.

I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com

--

Regards,

Binoy Dalal


---

L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina

Same result on my dev' server, it seems that collection param haven't 
effect on the query...


Q: I don't see on the solr 5.4 doc, the "collection" param for select 
handler, is it always present in 5.4 version ?


Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr> wrote:


Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2 



{ "responseHeader":{ "status":0, "QTime":1, "params":{ 
"fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2", 
"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

 "id":"EP1680447",
 "st":"LAPSED",
 "fid":"34520196"}]
   }
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2 



{
   "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
   "fl":"id,fid,cc*,st",
   "indent":"true",
   "q":"fid:34520196",
   "collection":"c1,c2",
   "wt":"json"}},
   "response":{"numFound":1,"start":0,"docs":[
   {
 "id":"WO2005040212",
 "st":"PENDING",
 "cc_CA":"LAPSED",
 "cc_EP":"LAPSED",
 "cc_JP":"PENDING",
 "cc_US":"LAPSED",
 "fid":"34520196"}]
   }}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with 
results
from collection that you sent request to so you assumed it returns 
only

results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:


Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel


http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2 



responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id": 
"1","

id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr> 
wrote:


yes id value is unique in C1 and unique in C2.

id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?
To get proper results, the ids should be unique across both the 
cores.


On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:

Hi All,


Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 300 000 records in each 
collection.


I try to use this request without having both results:

http://my_adress:my_port
/solr/C1/select?collection=C1,C2=fid:34520196=json

this request returns only C1 results and if I do:

http://my_adress:my_port
/solr/C2/select?collection=C1,C2=fid:34520196=json

it returns only C2 results.

I have 5 identical fields on both collection
id, fid, st, cc, timestamp
where id is the unique key field.

Can someone could explain me why it doesn't work ?

Thanks a lot !
Bruno

---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com

--

Regards,

Binoy Dalal


---
L'absence de virus dans ce courrier électronique a été vérifiée 
par le

logiciel antivirus Avast.
http://www.avast.com




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le 
logiciel antivirus Avast.

http://www.avast.com






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Yeah ! it works with your method !

thanks a lot Esther !


Le 06/01/2016 19:15, Esther-Melaine Quansah a écrit :

Ok, so join won’t work. Distributed search is your answer. This worked for me:

http://localhost:8983/solr/temp/select?shards=localhost:8983/solr/job,localhost:8983/solr/temp=*:*
 
<http://localhost:8983/solr/temp/select?shards=localhost:8983/solr/job,localhost:8983/solr/temp=*:*>

so for you it’d look something like

http://localhost:8983/solr/c1/select?shards=localhost:8983/solr/c1,localhost:8983/solr/c2=fid:34520196
 
<http://localhost:8983/solr/c1/select?shards=localhost:8983/solr/c1,localhost:8983/solr/c2=fid:34520196>
and obviously, you’ll just choose the ports that correspond to your 
configuration.

Esther

On Jan 6, 2016, at 9:36 AM, Bruno Mannina <bmann...@free.fr> wrote:

:( not work for me

http://my_adress:my_port/solr/c1/select?q={!join from=fid to=fid 
fromIndex=c2}fid:34520196=json

the result is always the same, it answer only for c1
34520196 has result in both collections



Le 06/01/2016 18:16, Binoy Dalal a écrit :

Bruno,
Use join like so:
{!join from=f1 to=f2 fromIndex=c2}
On c1

On Wed, 6 Jan 2016, 22:30 Bruno Mannina <bmann...@free.fr> wrote:


Hi Ester,

yes, i saw it, but if I use:

q={!join from=fid to=fid}fid:34520196 (with or not =c1,c2)

I have only the result from the collection used in the select/c1

Le 06/01/2016 17:52, esther.quan...@lucidworks.com a écrit :

Hi Bruno,

You might consider using the JoinQueryParser. Details here :

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser

Best,
Esther


Le 6 janv. 2016 à 08:48, Bruno Mannina <bmann...@free.fr> a écrit :

Same result on my dev' server, it seems that collection param haven't

effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select

handler, is it always present in 5.4 version ?

Le 06/01/2016 17:38, Bruno Mannina a écrit :

I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :

I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel


On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr>

wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same

query

fid:34520196

http://xxx.xxx.xxx.xxx:


/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{ "responseHeader":{ "status":0, "QTime":1, "params":{

"fl":"fid,cc*,st",

"indent":"true", "q":"fid:34520196", "collection":"c1,c2",

"wt":"json"}},

"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:


/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2

{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :


Hi Bruno,
Can you check counts? Is it possible that first page is only with

results

from collection that you sent request to so you assumed it returns

only

results from single collection?

Thanks,
Emir


On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel




http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start"

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Hi Shawn,

thanks for this info, I use solr alone on my own server.

Le 06/01/2016 20:13, Shawn Heisey a écrit :

On 1/6/2016 2:41 AM, Bruno Mannina wrote:

I try to use this request without having both results:

http://my_adress:my_port/solr/C1/select?collection=C1,C2=fid:34520196=json


this request returns only C1 results and if I do:

http://my_adress:my_port/solr/C2/select?collection=C1,C2=fid:34520196=json


it returns only C2 results.

Are you running in SolrCloud mode (with zookeeper)?  If you're not, then
the collection parameter doesn't do anything, and old-style distributed
search (with the shards parameter) will be your only option.

Thanks,
Shawn






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Bruno Mannina


Hi,

is it possible that was the problem wrote by Shawn and you have 
SolrCloud mode (with zookeeper) ?


The solution gives by Esther works fine so it's ok for me :)

**

Are you running in SolrCloud mode (with zookeeper)?  If you're not, then
the collection parameter doesn't do anything, and old-style distributed
search (with the shards parameter) will be your only option.

Thanks,
Shawn

***

Le 06/01/2016 19:17, Susheel Kumar a écrit :

Hi Bruno,  I just tested on 5.4 for your sake and it works fine.  You are
somewhere goofing up.  Please create a new simple schema different from
your use case with 2-3 fields with 2-3 documents and test this out
independently on your current problem.  That's what i can make suggestion
and did same to confirm this.

On Wed, Jan 6, 2016 at 11:48 AM, Bruno Mannina <bmann...@free.fr> wrote:


Same result on my dev' server, it seems that collection param haven't
effect on the query...

Q: I don't see on the solr 5.4 doc, the "collection" param for select
handler, is it always present in 5.4 version ?


Le 06/01/2016 17:38, Bruno Mannina a écrit :


I have a dev' server, I will do some test on it...

Le 06/01/2016 17:31, Susheel Kumar a écrit :


I'll suggest if you can setup some some test data locally and try this
out.  This will confirm your understanding.

Thanks,
Susheel

On Wed, Jan 6, 2016 at 10:39 AM, Bruno Mannina <bmann...@free.fr> wrote:

Hi Susheel, Emir,

yes I check, and I have one result in c1 and in c2 with the same query
fid:34520196

http://xxx.xxx.xxx.xxx:
/solr/c1/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2


{ "responseHeader":{ "status":0, "QTime":1, "params":{
"fl":"fid,cc*,st",
"indent":"true", "q":"fid:34520196", "collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[ {

  "id":"EP1680447",
  "st":"LAPSED",
  "fid":"34520196"}]
}
}


http://xxx.xxx.xxx.xxx:
/solr/c2/select?q=fid:34520196=json=true=id,fid,cc*,st=c1,c2


{
"responseHeader":{
  "status":0,
  "QTime":0,
  "params":{
"fl":"id,fid,cc*,st",
"indent":"true",
"q":"fid:34520196",
"collection":"c1,c2",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
  "id":"WO2005040212",
  "st":"PENDING",
  "cc_CA":"LAPSED",
  "cc_EP":"LAPSED",
  "cc_JP":"PENDING",
  "cc_US":"LAPSED",
  "fid":"34520196"}]
}}


I have the same xxx.xxx.xxx.xxx: (server:port).
unique key field C1, C2 : id

id data in C1 is different of id data in C2

Must I config/set something in solr ?

thanks,
Bruno


Le 06/01/2016 14:56, Emir Arnautovic a écrit :

Hi Bruno,

Can you check counts? Is it possible that first page is only with
results
from collection that you sent request to so you assumed it returns only
results from single collection?

Thanks,
Emir

On 06.01.2016 14:33, Susheel Kumar wrote:

Hi Bruno,

I just tested this scenario in my local solr 5.3.1 and it returned
results
from two identical collections. I doubt if it is broken in 5.4 just
double
check if you are not missing anything else.

Thanks,
Susheel



http://localhost:8983/solr/c1/select?q=id_type%3Ahello=json=true=c1,c2

responseHeader": {"status": 0,"QTime": 98,"params": {"q":
"id_type:hello","
indent": "true","collection": "c1,c2","wt": "json"}},
response": {"numFound": 2,"start": 0,"maxScore": 1,"docs": [{"id":
"1","
id_type": "hello","_version_": 1522623395043213300},{"id":
"3","id_type":"
hello","_version_": 1522623422397415400}]}

On Wed, Jan 6, 2016 at 6:13 AM, Bruno Mannina <bmann...@free.fr>
wrote:

yes id value is unique in C1 and unique in C2.


id in C1 is never present in C2
id in C2 is never present in C1


Le 06/01/2016 11:12, Binoy Dalal a écrit :

Are Id values for docs in both the collections exactly same?


To get proper results, the ids should be unique across both the
cores.

On Wed, 6 Jan 2016, 15:11 Bruno Mannina <bmann...@free.fr> wrote:

Hi All,

Solr 5.4, Ubuntu

I thought it was simple to request across two collections with the
same
schema but not.
I have one solr instance launch. 3

Re: Wildcard "?" ?

2015-10-22 Thread Bruno Mannina


Upayavira,

Thanks a lot for these information

Regards,
Bruno

Le 21/10/2015 19:24, Upayavira a écrit :

regexp will match the whole term. So, if you have stemming on, magnetic
may well stem to magnet, and that is the term against which the regexp
is executed.

If you want to do the regexp against the whole field, then you need to
do it against a string version of that field.

The process of using a regexp (and a wildcard for that matter) is:
  * search through the list of terms in your field for terms that match
  your regexp (uses an FST for speed)
  * search for documents that contain those resulting terms

Upayavira

On Wed, Oct 21, 2015, at 12:08 PM, Bruno Mannina wrote:

title:/magnet.?/ doesn't work for me because solr answers:

|title = "Magnetic folding system"|

but thanks to give me the idea to use regexp !!!

Le 21/10/2015 18:46, Upayavira a écrit :

No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


  Field: title

Field-Type:
  org.apache.solr.schema.TextField
PI Gap:
  100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


*

  org.apache.solr.analysis.TokenizerChain

*

  org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com



---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Wildcard "?" ?

2015-10-21 Thread Bruno Mannina


Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

 SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


   Field: title

Field-Type:
   org.apache.solr.schema.TextField
PI Gap:
   100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


 *

   org.apache.solr.analysis.TokenizerChain

 *

   org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Re: Wildcard "?" ?

2015-10-21 Thread Bruno Mannina


title:/magnet.?/ doesn't work for me because solr answers:

|title = "Magnetic folding system"|

but thanks to give me the idea to use regexp !!!

Le 21/10/2015 18:46, Upayavira a écrit :

No, you cannot tell Solr to handle wildcards differently. However, you
can use regular expressions for searching:

title:/magnet.?/ should do it.

Upayavira

On Wed, Oct 21, 2015, at 11:35 AM, Bruno Mannina wrote:

Dear Solr-user,

I'm surprise to see in my SOLR 5.0 that the wildward ? replace
inevitably 1 character.

my request is:

title:magnet? AND tire?

   SOLR found only title with a character after magnet and tire but don't
found
title with only magnet AND tire


Do you know where can I tell to solr that ? wildcard means [0, 1]
character and not [1] character ?
Is it possible ?


Thanks a lot !

my field in my schema is defined like that:


 Field: title

Field-Type:
 org.apache.solr.schema.TextField
PI Gap:
 100

Flags:  Indexed Tokenized   Stored  Multivalued
Properties  y
y
y
y
Schema  y
y
y
y
Index   y
y
y


   *

 org.apache.solr.analysis.TokenizerChain

   *

 org.apache.solr.analysis.TokenizerChain




---
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
http://www.avast.com






---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
http://www.avast.com

Possible or not ?

2015-06-05 Thread Bruno Mannina


Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one shoot ?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus

Re: Possible or not ?

2015-06-05 Thread Bruno Mannina


Ok thanks for these information !

Le 05/06/2015 17:37, Erick Erickson a écrit :

Picking up on Alessandro's point. While you can post all these docs
and commit at the end, unless you do a hard commit (
openSearcher=true or false doesn't matter), then if your server should
abnormally terminate for _any_ reason, all these docs will be
replayed on startup from the transaction log.

I'll also echo Alessandro's point that I don't see the advantage of this.
Personally I'd set my hard commit interval with openSearcher=false
to something like 6 (60 seconds it's in milliseconds) and forget
about it. You're not imposing  much extra load on the system, you're
durably saving your progress, you're avoiding really, really, really
long restarts if your server should stop for some reason.

If you don't want the docs to be _visible_ for searches, be sure your
autocommit has openSearcer set to false and disable soft commits
(set the interval to -1 or remove it from your solrconfig).

Best,
Erick

On Fri, Jun 5, 2015 at 8:21 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:

I can not see any problem in that, but talking about commits I would like
to make a difference between Hard and Soft .

Hard commit - durability
Soft commit - visibility

I suggest you this interesting reading :
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
It's an old interesting Erick post.

It explains you better what are the differences between different commit
types.

I would put you in this scenario :

Heavy (bulk) indexing

The assumption here is that you’re interested in getting lots of data to
the index as quickly as possible for search sometime in the future. I’m
thinking original loads of a data source etc.

- Set your soft commit interval quite long. As in 10 minutes or even
longer (-1 for no soft commits at all). *Soft commit is about
visibility, *and my assumption here is that bulk indexing isn’t about
near real time searching so don’t do the extra work of opening any kind of
searcher.
- Set your hard commit intervals to 15 seconds, openSearcher=false.
Again the assumption is that you’re going to be just blasting data at Solr.
The worst case here is that you restart your system and have to replay 15
seconds or so of data from your tlog. If your system is bouncing up and
down more often than that, fix the reason for that first.
- Only after you’ve tried the simple things should you consider
refinements, they’re usually only required in unusual circumstances. But
they include:
   - Turning off the tlog completely for the bulk-load operation
   - Indexing offline with some kind of map-reduce process
   - Only having a leader per shard, no replicas for the load, then
   turning on replicas later and letting them do old-style replication to
   catch up. Note that this is automatic, if the node discovers it is “too
   far” out of sync with the leader, it initiates an old-style replication.
   After it has caught up, it’ll get documents as they’re indexed to the
   leader and keep its own tlog.
   - etc.



Actually you could do the commit only at the end, but I can not see any
advantage in that.
I suggest you to play with auto hard/soft commit config and get a better
idea of the situation !

Cheers

2015-06-05 16:08 GMT+01:00 Bruno Mannina bmann...@free.fr:


Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml
file (with a structure defined in my schema.xml)

I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just
changing X from 1 to 50)

I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory
error because there are too many files
in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?



Le 05/06/2015 16:59, Alessandro Benedetti a écrit :


Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina bmann...@free.fr:

  Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one
shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I

Re: Possible or not ?

2015-06-05 Thread Bruno Mannina

Thanks for the link,

So, I launch this post, I will see on Monday if it will ok :)

Le 05/06/2015 17:21, Alessandro Benedetti a écrit :

I can not see any problem in that, but talking about commits I would like
to make a difference between Hard and Soft .

Hard commit - durability
Soft commit - visibility

I suggest you this interesting reading :
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
It's an old interesting Erick post.

It explains you better what are the differences between different commit
types.

I would put you in this scenario :

Heavy (bulk) indexing

The assumption here is that you’re interested in getting lots of data to
the index as quickly as possible for search sometime in the future. I’m
thinking original loads of a data source etc.

- Set your soft commit interval quite long. As in 10 minutes or even
longer (-1 for no soft commits at all). *Soft commit is about
visibility, *and my assumption here is that bulk indexing isn’t about
near real time searching so don’t do the extra work of opening any kind of
searcher.
- Set your hard commit intervals to 15 seconds, openSearcher=false.
Again the assumption is that you’re going to be just blasting data at Solr.
The worst case here is that you restart your system and have to replay 15
seconds or so of data from your tlog. If your system is bouncing up and
down more often than that, fix the reason for that first.
- Only after you’ve tried the simple things should you consider
refinements, they’re usually only required in unusual circumstances. But
they include:
- Turning off the tlog completely for the bulk-load operation
- Indexing offline with some kind of map-reduce process
- Only having a leader per shard, no replicas for the load, then
turning on replicas later and letting them do old-style replication to
catch up. Note that this is automatic, if the node discovers it is “too
far” out of sync with the leader, it initiates an old-style replication.
After it has caught up, it’ll get documents as they’re indexed to the
leader and keep its own tlog.
- etc.

Actually you could do the commit only at the end, but I can not see any
advantage in that.
I suggest you to play with auto hard/soft commit config and get a better
idea of the situation !

Cheers

2015-06-05 16:08 GMT+01:00 Bruno Mannina bmann...@free.fr:

Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml
file (with a structure defined in my schema.xml)

I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just
changing X from 1 to 50)

I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory
error because there are too many files
in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?

Le 05/06/2015 16:59, Alessandro Benedetti a écrit :

Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina bmann...@free.fr:

Dear Solr Users,

I would like to post 1 000 000 records (1 records = 1 files) in one
shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
https://www.avast.com/antivirus

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus

Re: Possible or not ?

2015-06-05 Thread Bruno Mannina


Hi Alessandro,

I'm actually on my dev' computer, so I would like to post 1 000 000 xml 
file (with a structure defined in my schema.xml)


I have already import 1 000 000 xml files by using
bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
where /DATA0/X contains 20 000 xml files (I do it 20 times by just 
changing X from 1 to 50)


I would like to do now
bin/post -c mydb /DATA1

I would like to know If my SOLR5 will run fine and no provide an memory 
error because there are too many files

in one post without doing a commit?

The commit will be done at the end of 1 000 000.

Is it ok ?


Le 05/06/2015 16:59, Alessandro Benedetti a écrit :

Hi Bruno,
I can not see what is your challenge.
Of course you can index your data in the flavour you want and do a commit
whenever you want…
Are those xml Solr xml ?
If not you would need to use the DIH, the extract update handler or any
custom Indexer application.
Maybe I missed your point…
Give me more details please !

Cheers

2015-06-05 15:41 GMT+01:00 Bruno Mannina bmann...@free.fr:


Dear Solr Users,

I would like to post  1 000 000 records (1 records = 1 files) in one shoot
?
and do the commit and the end.

Is it possible to do that ?

I've several directories with each 20 000 files inside.
I would like to do:
bin/post -c mydb /DATA

under DATA I have
/DATA/1/*.xml (20 000 files)
/DATA/2/*.xml (20 000 files)
/DATA/3/*.xml (20 000 files)

/DATA/50/*.xml (20 000 files)

Actually, I post 5 directories in one time (it takes around 1h30 for 100
000 records/files)

But it's Friday and I would like to run it during the W.E. alone.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
https://www.avast.com/antivirus







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
https://www.avast.com/antivirus

Help for a field in my schema ?

2015-05-29 Thread Bruno Mannina


Dear Solr-Users,

(SOLR 5.0 Ubuntu)

I have xml files with tags like this
claimXXYYY

where XX is a language code like FR EN DE PT etc... (I don't know the
number of language code I can have)
and YYY is a number [1..999]

i.e.:
claimen1
claimen2
claimen3
claimfr1
claimfr2
claimfr3

I would like to define fields named:
*claimen* equal to claimenYYY (EN language, all numbers, indexed=true,
stored=true) (search needed and must be displayed)
*claim *equal to all claimXXYYY (all languages, all numbers,
indexed=true, stored false) (search not needed but must be displayed)

Is it possible to have these 2 fields ?

Could you help me to declare them in my schema.xml ?

Thanks a lot for your help !

Bruno



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina


oh yes like this:

 find  /data/hbl-201522/-name  *.xml  -exec  bin/post -c hbl  {}  \;

?

Le 29/05/2015 14:15, Sergey Shvets a écrit :

Hello Bruno,

You can use find command with exec attribute.

regards
  Sergey

Friday, May 29, 2015, 3:11:37 PM, you wrote:

Dear Solr Users,

Habitualy i use this command line to index my files:
  bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kox150ko)

I get this error:
Error: bin/post argument too long

How could I index the whole directory ?

Thanks a lot for your help,

Solr 5.0 - Ubuntu

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com








---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

How to index 20 000 files with a command line ?

2015-05-29 Thread Bruno Mannina


Dear Solr Users,

Habitualy i use this command line to index my files:
bin/post -c hbl /data/hbl-201522/*.xml

but today I have a big update, so there are 20 000 xml files (each files
1kox150ko)

I get this error:
Error: bin/post argument too long

How could I index the whole directory ?

Thanks a lot for your help,

Solr 5.0 - Ubuntu

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-06 Thread Bruno Mannina


Yes thanks it's now for me too.

Daniel, my pn is always in uppercase and I index them always in uppercase.
the problem (solved now after all your answers, thanks) was the request, 
if users

requests with lowercase then solr reply no result and it was not good.

but now the problem is solved, I changed in my source file the name pn 
field to id

and in my schema I use a copy field named pn and it works perfectly.

Thanks a lot !!!

Le 06/05/2015 09:44, Daniel Collins a écrit :

Ah, I remember seeing this when we first started using Solr (which was 4.0
because we needed Solr Cloud), I never got around to filing an issue for it
(oops!), but we have a note in our schema to leave the key field a normal
string (like Bruno we had tried to lowercase it which failed).
We didn't really know Solr in those days, and hadn't really thought about
it since then, but Hoss' and Erick's explanations make perfect sense now!

Since shard routing is (basically) done on hashes of the unique key, if I
have 2 documents which are the same, but have values HELLO and hello,
they might well hash to completely different shards, so the update
logistics would be horrible.

Bruno, why do you need to lowercase at all then?  You said in your example,
that your client application always supplies pn and it is always
uppercase, so presumably all adds/updates could be done directly on that
field (as a normal string with no lowercasing).  Where does the case
insensitivity come in, is that only for searching?  If so couldn't you add
a search field (called id), and update your app to search using that (or
make that your default search field, I guess it depends if your calling app
explicitly uses the pn field name in its searches).


On 6 May 2015 at 01:55, Erick Erickson erickerick...@gmail.com wrote:


Well, working fine may be a bit of an overstatement. That has never
been officially supported, so it just happened to work in 3.6.

As Chris points out, if you're using SolrCloud then this will _not_
work as routing happens early in the process, i.e. before the analysis
chain gets the token so various copies of the doc will exist on
different shards.

Best,
Erick

On Mon, May 4, 2015 at 4:19 PM, Bruno Mannina bmann...@free.fr wrote:

Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each

doc

added with same code is updated not added.

To be more clear, I receive docs with a field name pn and it's the
uniqueKey, and it always in uppercase

so I must define in my schema.xml

 field name=id type=string multiValued=false indexed=true
required=true stored=true/
 field name=pn type=text_general multiValued=true

indexed=true

stored=false/
...
uniqueKeyid/uniqueKey
...
   copyField source=id dest=pn/

but the application that use solr already exists so it requests with pn
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i
cannot also change that.

so there is a problem no ? I must import a id field and request a pn

field,

but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
:
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id foo overwrites a doc with id FOO then the only reliable way

to

make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed

to

the correct shard, and so the correct existing doc is overwritten (even

if

you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR 
5.0, Java 7

This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of 
total RAM available).

So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query 
needs

SOLR_JAVA_MEM=-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


Dear Solr Users,

I have a brand new computer where I installed Ubuntu 14.04, 8Go RAM,
SOLR 5.0, Java 7
I indexed 92 000 000 docs (little text file ~2ko each)
I have around 30 fields

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:58644/query/delete
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:162882/query/delete
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
deletequeryf1:2868668/query/delete

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

On my old solr 3.6, I don't use the same line to delete document, I use:
java -jar -Ddata=args -Dcommit=no  post.jar
deleteid113422/id/delete

You can see that I use directly id not query, and my schema between
solr3.6 and solr5.0 is almost the same.
I have just some more fields.
why this method do not work now ?

Thanks a lot,
Bruno


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs
#   Executing /bin/sh -c /home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:58644/query/delete
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:162882/query/delete
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
deletequeryf1:2868668/query/delete

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 8983.

Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina


I increase the

formdataUploadLimitInKB

to 2048000 and the problem is the same, same error

an idea ?



Le 04/05/2015 16:38, Bruno Mannina a écrit :

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=/home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs
#   Executing /bin/sh -c /home/solr/solr-5.0.0/bin/oom_solr.sh 
8983/home/solr/solr-5.0.0/server/logs...

Running OOM killer script for process 28233 for Solr on port 8983
Killed process 28233

I try in few minutes to increase the

formdataUploadLimitInKB

and I will tell you the result.

Le 04/05/2015 14:58, Shawn Heisey a écrit :

On 5/4/2015 3:19 AM, Bruno Mannina wrote:

All work fine but each Tuesday I need to delete some docs inside, so I
create a batch file
with inside line like this:
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:58644/query/delete
/home/solr/solr-5.0.0/bin/post -c docdb  -commit no -d
deletequeryf1:162882/query/delete
..
.
/home/solr/solr-5.0.0/bin/post -c docdb  -commit yes -d
deletequeryf1:2868668/query/delete

my f1 field is my key field. It is unique.

But if my file contains more than one or two hundreds line, my solr
shutdown.
Two hundreds line shutdown always solr 5.0.
I have no error in my console, just Solr can't be reach on the port 
8983.


Is exists a variable that I must increase to disable this error ?

As far as I know, the only limit that can affect that is the maximum
post size.  Current versions of Solr default to a 2MB max post size,
using the formdataUploadLimitInKB attribute on the requestParsers
element in solrconfig.xml, which defaults to 2048.

Even if that limit is exceeded by a request, it should not crash Solr,
it should simply log an error and ignore the request.  It would be a bug
if Solr does crash.

What happens if you increase that limit?  Are you seeing any error
messages in the Solr logfile when you send that delete request?

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

Thanks

Le 04/05/2015 16:50, Scott Dawson a écrit :

Bruno,
You have the wrong kind of dash (a long dash) in front of the Xmx flag.
Could that be causing a problem?

Regards,
Scott

On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina bmann...@free.fr wrote:


Dear Solr Community,

I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR
5.0, Java 7
This is a brand new installation.

all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total
RAM available).
So I edit the bin/solr.in.sh

# Increase Java Min/Max Heap as needed to support your indexing / query
needs
SOLR_JAVA_MEM=-Xms3g –Xmx3g -XX:MaxPermSize=512m -XX:PermSize=512m

but with this param, the solr server can't be start, I use:
bin/solr start

Do you have an idea of the problem ?

Thanks a lot for your comment,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina

Yes it was that ! I increased the SOLR_JAVA_MEM to 2g (with 8Go Ram i do 
more, 3g fail to run solr on my brand new computer)


thanks !

Le 04/05/2015 17:03, Shawn Heisey a écrit :

On 5/4/2015 8:38 AM, Bruno Mannina wrote:

ok I have this OOM error in the log file ...

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=/home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs
#   Executing /bin/sh -c /home/solr/solr-5.0.0/bin/oom_solr.sh
8983/home/solr/solr-5.0.0/server/logs...
Running OOM killer script for process 28233 for Solr on port 8983

Out Of Memory errors are a completely different problem.  Solr behavior
is completely unpredictable after an OutOfMemoryError exception, so the
5.0 install includes a script to run on OOME that kills Solr.  It's the
only safe way to handle that problem.

Your Solr install is not being given enough Java heap memory for what it
is being asked to do.  You need to increase the heap size for Solr.  If
you look at the admin UI for Solr in a web browser, you can see what the
max heap is set to ... on a default 5.0 install running Solr with
bin/solr the max heap will be 512m ... which is VERY small.  Try using
bin/solr with the -m option, set to something like 2g (for 2 gigabytes
of heap).

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


Shaun thanks a lot for this comment,

So, I have this information, no information about 32 or 64 bits...

solr@linux:~$ java -version
java version 1.7.0_79
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC 
2015 i686 i686 i686 GNU/Linux

solr@linux:~$

I need to install a new version of Java ? I just install my ubuntu since 
one week :)

updates are up to date.

Le 04/05/2015 17:23, Shawn Heisey a écrit :

On 5/4/2015 9:09 AM, Bruno Mannina wrote:

Yes ! it works !!!

Scott perfect 

For my config 3g do not work, but 2g yes !

If you can't start Solr with a 3g heap, chances are that you are running
a 32-bit version of Java.  A 32-bit Java cannot go above a 2GB heap.  A
64-bit JVM requires a 64-bit operating system, which requires a 64-bit
CPU.  Since 2006, Intel has only been providing 64-bit chips to the
consumer market, and getting a 32-bit chip in a new computer has gotten
extremely difficult.  The server market has had only 64-bit chips from
Intel since 2005.  I am not sure what those dates look like for AMD
chips, but it is probably similar.

Running java -version should give you enough information to determine
whether your Java is 32-bit or 64-bit.  This is the output from that
command on a Linux machine that is running a 64-bit JVM from Oracle:

root@idxa4:~# java -version
java version 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

If you are running Solr on Linux, then the output of uname -a should
tell you whether your operating system is 32 or 64 bit.

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina


ok, I note all these information, thanks !

I will update if it's needed. 2go seems to be ok.

Le 04/05/2015 18:46, Shawn Heisey a écrit :

On 5/4/2015 10:28 AM, Bruno Mannina wrote:

solr@linux:~$ java -version
java version 1.7.0_79
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Server VM (build 24.79-b02, mixed mode)
solr@linux:~$

solr@linux:~$ uname -a
Linux linux 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:11:46 UTC
2015 i686 i686 i686 GNU/Linux
solr@linux:~$

Both Linux and Java are 32-bit.  For linux, I know this because your
arch is i686, which means it is coded for a newer generation 32-bit
CPU.  You can't be running a 64-bit Java, and the Java version confirms
that because it doesn't contain 64-bit.

Run this command:

cat /proc/cpuinfo

If the flags on the CPU contain the string lm (long mode), then your
CPU is capable of running a 64-bit (sometimes known as amd64 or x86_64)
version of Linux, and a 64-bit Java.  You will need to re-install both
Linux and Java to get this capability.

Here's uname -a from a 64-bit version of Ubuntu:

Linux lb1 3.13.0-51-generic #84-Ubuntu SMP Wed Apr 15 12:08:34 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

Since you are running 5.0, I would recommend Oracle Java 8.

http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

Thanks,
Shawn






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Dear Solr users,

I have a problem with SOLR5.0 (and not on SOLR3.6)

What kind of field can I use for my uniqueKey field named code if I
want it case insensitive ?

On SOLR3.6, I defined a string_ci field like this:

fieldType name=string_ci class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

field name=pn type=string_ci multiValued=false indexed=true
required=true stored=true/

and it works fine.
- If I add a document with the same code then the doc is updated.
- If I search a document with lower or upper case, the doc is found


But in SOLR5.0, if I use this definition then :
- I can search in lower/upper case, it's OK
- BUT if I add a doc with the same code then the doc is added not updated !?

I read that the problem could be that the type of field is tokenized
instead of use a string.

If I change from string_ci to string, then
- I lost the possibility to search in lower/upper case
- but it works fine to update the doc.

So, could you help me to find the right field type to:

- search in case insensitive
- if I add a document with the same code, the old doc will be updated

Thanks a lot !


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina


Hello Chris,

yes I confirm on my SOLR3.6 it works fine since several years, and each 
doc added with same code is updated not added.


To be more clear, I receive docs with a field name pn and it's the 
uniqueKey, and it always in uppercase


so I must define in my schema.xml

field name=id type=string multiValued=false indexed=true 
required=true stored=true/
field name=pn type=text_general multiValued=true 
indexed=true stored=false/

...
   uniqueKeyid/uniqueKey
...
  copyField source=id dest=pn/

but the application that use solr already exists so it requests with pn 
field not id, i cannot change that.
and in each docs I receive, there is not id field, just pn field, and  i 
cannot also change that.


so there is a problem no ? I must import a id field and request a pn 
field, but I have a pn field only for import...



Le 05/05/2015 01:00, Chris Hostetter a écrit :

: On SOLR3.6, I defined a string_ci field like this:
:
: fieldType name=string_ci class=solr.TextField
: sortMissingLast=true omitNorms=true
: analyzer
:   tokenizer class=solr.KeywordTokenizerFactory/
:   filter class=solr.LowerCaseFilterFactory/
: /analyzer
: /fieldType
:
: field name=pn type=string_ci multiValued=false indexed=true
: required=true stored=true/


I'm really suprised that field would have worked for you (reliably) as a
uniqueKey field even in Solr 3.6.

the best practice for something like what you describe has always (going
back to Solr 1.x) been to use a copyField to create a case insensitive
copy of your uniqueKey for searching.

if, for some reason, you really want case insensitve *updates* (so a doc
with id foo overwrites a doc with id FOO then the only reliable way to
make something like that work is to do the lowercassing in an
UpdateProcessor to ensure it happens *before* the docs are distributed to
the correct shard, and so the correct existing doc is overwritten (even if
you aren't using solr cloud)



-Hoss
http://www.lucidworks.com/





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Solr5.0.0, do a commit alone ?

2015-04-21 Thread Bruno Mannina


Dear Solr Users,

With Solr3.6, when I want to force a commit without giving data, I do:
java -jar post.jar

Now with Solr5.0.0, I use
bin/post .

but it do not accept to do a commit if I don't give a data directory. ie:
bin/post -c mydb -commit yes

I want to do that because I have a file with delete action.
Each line in this file contains one ref to delete
bin/post -c mydb -commit no -d delete.../delete
So I would like to do the commit only after running my file with a
command line

bin/post -c mydb -commit yes (without data) is not accepted by post

Thanks,
Sincerely,
Bruno




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Correspondance table ?

2015-04-20 Thread Bruno Mannina


Dear Solr Users,

Solr 5.0.0

I have actually around 90 000 000 docs in my solr, and I have a field
with one char which represents a category. i.e:
value = a, definition : nature and health
etc...
I have fews categories, around 15.

These definition categories can changed during years.

Can I use a file where I will have
a\tNature and Health
b\tComputer science
etc...

and instead of having the code letter in my json result solr, I will
have the definition ?
Only in the result.
The query will be done with the code letter.

I'm sure it's possible !

Additional question: is it possible to do that also with a big
correspondance file? around 5000 definitions?

Thanks for your help,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 5.0, defaultSearchField, defaultOperator ?

2015-04-18 Thread Bruno Mannina


Thx Chris  Ahmet !

Le 17/04/2015 23:56, Chris Hostetter a écrit :

: df and q.op are the ones you are looking for.
: You can define them in defaults section.

specifically...

https://cwiki.apache.org/confluence/display/solr/InitParams+in+SolrConfig


:
: Ahmet
:
:
:
: On Friday, April 17, 2015 9:18 PM, Bruno Mannina bmann...@free.fr wrote:
: Dear Solr users,
:
: Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
: schema for solr 5.0.
:
: I have two questions:
: - how can I set the defaultSearchField ?
: I don't want to use in the query the df tag  because I have a lot of
: modification to do for that on my web project.
:
: - how can I set the defaultOperator (and|or) ?
:
: It seems that these options are now deprecated in SOLR 5.0 schema.
:
: Thanks a lot for your comment,
:
: Regards,
: Bruno
:
: ---
: Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
parce que la protection avast! Antivirus est active.
: http://www.avast.com
:

-Hoss
http://www.lucidworks.com/



---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Solr 5.0, defaultSearchField, defaultOperator ?

2015-04-17 Thread Bruno Mannina


Dear Solr users,

Since today I used SOLR 5.0 (I used solr 3.6) so i try to adapt my old
schema for solr 5.0.

I have two questions:
- how can I set the defaultSearchField ?
I don't want to use in the query the df tag  because I have a lot of
modification to do for that on my web project.

- how can I set the defaultOperator (and|or) ?

It seems that these options are now deprecated in SOLR 5.0 schema.

Thanks a lot for your comment,

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?


regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0rows=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 lst  name=DE202010012045U1
arr  name=aben
  str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
  str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
fromlt;emgt;plasticlt;/emgt;/str

/arr
  /lst
  lst  name=JP2014091382A
arr  name=aben
  str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes 
3 and 3 are stuck/str

/arr
  /lst
  lst  name=DE10201740A1
arr  name=aben
  str  elements. A connecting element is formed as a hinge, a 
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part. 
#CMT#USE/str

/arr
  /lst
  lst  name=US2008276751A1
arr  name=aben
  strA bicycle handlebar grip includes an inner fiber layer and 
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
  str  handlebar grip, while thelt;emgt;plasticlt;/emgt;  
layer is soft and has an adjustable thickness to provide a 
comfortable/str
  str  sensation to a user. In addition, 
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
coated on the outer surface/str
  str  layer to enhance the combination strength between the 
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
enhance/str

/arr
  /lst






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina


Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField 
for abfr, aben, abit, abpt


Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
arr  name=tien
  strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
/arr
arr  name=aben
  str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a holder 
(1), particularly for a water bottle/str
  str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by a 
screw (5), where a mounting element has a compensation/str
  str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
/arr
  /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?

^^
2. Try removing the word and from the query.  There may be some interaction 
with a stop word filter.  If you want a phrase query, wrap it in quotes.

3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0row
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



  lst  name=DE202010012045U1
 arr  name=aben
   str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
   str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
 /arr
   /lst
   lst  name=JP2014091382A
 arr  name=aben
   str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
 /arr
   /lst
   lst  name=DE10201740A1
 arr  name=aben
   str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
 /arr
   /lst
   lst  name=US2008276751A1
 arr  name=aben
   strA bicycle handlebar grip includes an inner fiber layer and
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
   str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness to provide a
comfortable/str
   str  sensation to a user. In addition,
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion
coated on the outer surface/str
   str  layer to enhance the combination strength between the
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to
enhance/str
 /arr
   /lst


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina


ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
  fragmentsBuilder  name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst  name=defaults
  str  name=hl.tag.pre![CDATA[
   b style=background:yellow,b style=background:lawgreen,
   b style=background:aquamarine,b style=background:magenta,
   b style=background:palegreen,b style=background:coral,
   b style=background:wheat,b style=background:khaki,
   b style=background:lime,b 
style=background:deepskyblue]]/str
  str  name=hl.tag.post![CDATA[/b]]/str
/lst
  /fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField for 
abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
  arr  name=tien
strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
  /arr
  arr  name=aben
str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame by 
a screw (5), where a mounting element has a compensation/str
str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
  /arr
/lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
 
^^ 2. Try removing the word and from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0ro
w
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



   lst  name=DE202010012045U1
  arr  name=aben
str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT#
The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
  /arr
/lst
lst  name=JP2014091382A
  arr  name=aben
str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3
having two heat fusion layers, and the
twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
  /arr
/lst
lst  name=DE10201740A1
  arr  name=aben
str  elements. A connecting element is formed as a hinge, a
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part.
#CMT#USE/str
  /arr
/lst
lst  name=US2008276751A1
  arr  name=aben
strA bicycle handlebar grip includes an inner fiber layer
and an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
str  handlebar grip, while thelt;emgt;plasticlt;/emgt;
layer is soft and has an adjustable thickness

Re: Solr 3.6, Highlight and multi words?

2015-04-01 Thread Bruno Mannina


of course no prb charles, you already help me !

Le 01/04/2015 21:54, Reitzel, Charles a écrit :

Sorry, I've never tried highlighting in multiple colors...

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

ok for qf (i can't test now)

but concerning hl.simple.pre hl.simple.post I can define only one color no ?

in the sample solrconfig.xml there are several color,

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder  name=colored
  class=solr.highlight.ScoreOrderFragmentsBuilder
  lst  name=defaults
str  name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b 
style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b 
style=background:deepskyblue]]/str
str  name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder

How can I tell to solr to use these color instead of hl.simple.pre/post ?



Le 01/04/2015 20:58, Reitzel, Charles a écrit :

If you want to query on the field ab, you'll probably need to add it the qf 
parameter.

To control the highlighting markup, with the standard highlighter, use 
hl.simple.pre and hl.simple.post.

https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter


-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 2:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Dear Charles,

Thanks for your answer, please find below my answers.

ok it works if I use aben as field in my query as you say in Answer 1.
it doesn't work if I use ab may be because ab field is a copyField
for abfr, aben, abit, abpt

Concerning the 2., yes you have right it's not and but AND

I have this result:

lst  name=DE102009043935B3
   arr  name=tien
 strlt;emgt;Bicyclelt;/emgt;  frame comprises holder, particularly for 
water bottle, where holder is connected/str
   /arr
   arr  name=aben
 str#CMT# #/CMT# Thelt;emgt;bicyclelt;/emgt;  frame (7) comprises a 
holder (1), particularly for a water bottle/str
 str. The holder is connected with thelt;emgt;bicyclelt;/emgt;  frame 
by a screw (5), where a mounting element has a compensation/str
 str  section which is made of an elastic material, particularly 
alt;emgt;plasticlt;/emgt;  material. The compensation section/str
   /arr
 /lst


So my last question is why I haven't em/em instead having colored ?
How can I tell to solr to use the colored ?

Thanks a lot,
Bruno


Le 01/04/2015 17:15, Reitzel, Charles a écrit :

Haven't used Solr 3.x in a long time.  But with 4.10.x, I haven't had any 
trouble with multiple terms.  I'd look at a few things.

1.  Do you have a typo in your query?  Shouldn't it be q=aben:(plastic and 
bicycle)?
  
^^ 2. Try removing the word and from the query.  There may be some interaction with a stop word filter.  If you want a phrase query, wrap it in quotes.


3.  Also, be sure that the query and indexing analyzers for the aben field are 
compatible with each other.

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Wednesday, April 01, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.6, Highlight and multi words?

Sorry to disturb you with the renew but nobody use or have problem with 
multi-terms and highlight ?

regards,

Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only
one keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0r
o
w
s=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5


Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



lst  name=DE202010012045U1
   arr  name=aben
 str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a
pedal body (10) made fromlt;emgt;plasticlt;/emgt; material/str
 str, particularly for touring bike. #CMT#ADVANTAGE :
#/CMT# The bicycle pedal has a pedal body made
fromlt;emgt;plasticlt;/emgt;/str
   /arr
 /lst
 lst  name=JP2014091382A
   arr  name=aben
 str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3
having two heat fusion layers, and the
twolt;emgt;plasticlt;/emgt;  tapes
3 and 3 are stuck/str
   /arr
 /lst
 lst  name=DE10201740A1
   arr  name=aben
 str  elements. A connecting element

Solr 3.6, Highlight and multi words?

2015-03-29 Thread Bruno Mannina


Dear Solr User,

I try to work with highlight, it works well but only if I have only one
keyword in my query?!
If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0rows=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5

Could you help me please to understand ? I read doc, google, without
success...
so I post here...

my result is:



 lst  name=DE202010012045U1
arr  name=aben
  str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal body (10) made 
fromlt;emgt;plasticlt;/emgt;  material/str
  str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# The bicycle pedal has a pedal 
body made fromlt;emgt;plasticlt;/emgt;/str
/arr
  /lst
  lst  name=JP2014091382A
arr  name=aben
  str  betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having two heat fusion layers, and 
the twolt;emgt;plasticlt;/emgt;  tapes 3 and 3 are stuck/str
/arr
  /lst
  lst  name=DE10201740A1
arr  name=aben
  str  elements. A connecting element is formed as a hinge, a flexible foil or a 
flexiblelt;emgt;plasticlt;/emgt;  part. #CMT#USE/str
/arr
  /lst
  lst  name=US2008276751A1
arr  name=aben
  strA bicycle handlebar grip includes an inner fiber layer and an 
outerlt;emgt;plasticlt;/emgt;  layer. Thus, the fiber/str
  str  handlebar grip, while thelt;emgt;plasticlt;/emgt;  layer is soft and 
has an adjustable thickness to provide a comfortable/str
  str  sensation to a user. In addition, thelt;emgt;plasticlt;/emgt;  layer 
includes a holding portion coated on the outer surface/str
  str  layer to enhance the combination strength between the fiber layer and 
thelt;emgt;plasticlt;/emgt;  layer and to enhance/str
/arr
  /lst






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Solr 3.6, Highlight and multi words?

2015-03-29 Thread Bruno Mannina


Additional information, in my schema.xml, my field is defined like this:

 field  
name=abentype=text_enindexed=truestored=truemultiValued=true/

May be it misses something? like termVectors



Le 29/03/2015 21:15, Bruno Mannina a écrit :

Dear Solr User,

I try to work with highlight, it works well but only if I have only 
one keyword in my query?!

If my request is plastic AND bicycle then only plastic is highlight.

my request is:

./select/?q=ab%3A%28plastic+and+bicycle%29version=2.2start=0rows=10indent=onhl=truehl.fl=tien,abenfl=pnf.aben.hl.snippets=5 



Could you help me please to understand ? I read doc, google, without 
success...

so I post here...

my result is:



 lst  name=DE202010012045U1
arr  name=aben
  str(EP2423092A1) #CMT# #/CMT# The bicycle pedal has a pedal 
body (10) made fromlt;emgt;plasticlt;/emgt; material/str
  str, particularly for touring bike. #CMT#ADVANTAGE : #/CMT# 
The bicycle pedal has a pedal body made 
fromlt;emgt;plasticlt;/emgt;/str

/arr
  /lst
  lst  name=JP2014091382A
arr  name=aben
  str betweenlt;emgt;plasticlt;/emgt;  tapes 3 and 3 having 
two heat fusion layers, and the twolt;emgt;plasticlt;/emgt;  tapes 
3 and 3 are stuck/str

/arr
  /lst
  lst  name=DE10201740A1
arr  name=aben
  str  elements. A connecting element is formed as a hinge, a 
flexible foil or a flexiblelt;emgt;plasticlt;/emgt;  part. 
#CMT#USE/str

/arr
  /lst
  lst  name=US2008276751A1
arr  name=aben
  strA bicycle handlebar grip includes an inner fiber layer and 
an outerlt;emgt;plasticlt;/emgt; layer. Thus, the fiber/str
  str  handlebar grip, while thelt;emgt;plasticlt;/emgt;  
layer is soft and has an adjustable thickness to provide a 
comfortable/str
  str  sensation to a user. In addition, 
thelt;emgt;plasticlt;/emgt;  layer includes a holding portion 
coated on the outer surface/str
  str  layer to enhance the combination strength between the 
fiber layer and thelt;emgt;plasticlt;/emgt;  layer and to 
enhance/str

/arr
  /lst






---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Request two databases at the same time ?

2015-01-09 Thread Bruno Mannina


Dear All,

I use Apache-SOLR3.6, on Ubuntu (newbie user).

I have a big database named BigDB1 with 90M documents,
each document contains several fields (docid, title, author, date, etc...)

I received today from another source, abstract of some documents (there
are also the same docid field in this source).
I don't want to modify my BigDB1 to update documents with abstract
because BigDB1 is always updated twice by week.

Do you think it's possible to create a new database named AbsDB1 and
request the both database at the same time ?
 if I do for example:
title:airplane AND abstract:plastic

I would like to obtain documents from BigDB1 and AbsDB1.

Many thanks for your help, information and others things that can help me.

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Request two databases at the same time ?

2015-01-09 Thread Bruno Mannina


Dear Erick,

thank you for your answer.

My answers are below.

Le 09/01/2015 20:43, Erick Erickson a écrit :

bq: I don't want to modify my BigDB1 to update documents with abstract
because BigDB1 is always updated twice by week.

Why not? Solr/Lucene handle updating docs, if a doc in the index has
the same uniqueKey, the old doc is deleted and the new one takes its
place. So why not just put the new abstracts into BigDB1? If you
re-index the docs later (your twice/week comment), then they'll be
overwritten. This will be much simpler than trying to maintain two.
I understand this process, I use it for other collections and twice time 
by week for BigDB1.
But, i.e. Doc1 is updated with Abstract on Monday. Tuesday I must update 
it with new data, then Abstract will be lost.
I can't check/get abstract before to re-insert it in the new doc because 
I receive several thousand docs every week (new and amend),

i think it will take a long time to do that.


But if you cannot update BigDB1 just fire off two queries and combine
them. Or specify the shards parameter on the URL pointing to both
collections. Do note, though, that the relevance calculations may not
be absolutely comparable, so mixing the results may show some
surprises...

Shards..I wilkl take a look to this, I don't know this param.
Concerning relevance, I don't really use it, so it won't be a problem I 
think.



Sincerely,


Best,
Erick

On Fri, Jan 9, 2015 at 9:12 AM, Bruno Mannina bmann...@free.fr wrote:

Dear All,

I use Apache-SOLR3.6, on Ubuntu (newbie user).

I have a big database named BigDB1 with 90M documents,
each document contains several fields (docid, title, author, date, etc...)

I received today from another source, abstract of some documents (there are
also the same docid field in this source).
I don't want to modify my BigDB1 to update documents with abstract because
BigDB1 is always updated twice by week.

Do you think it's possible to create a new database named AbsDB1 and request
the both database at the same time ?
  if I do for example:
title:airplane AND abstract:plastic

I would like to obtain documents from BigDB1 and AbsDB1.

Many thanks for your help, information and others things that can help me.

Regards,
Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina


Hi Jack,

ok but for 2000 values, it means that I must do 40 requests if I choose 
to have 50 values by requests :'(
and in my case, user can choose about 8 topics, so it can generate 8 
times 40 requests... humm...


is it not possible to send a text, json, xml file ?

Le 10/08/2014 17:38, Jack Krupansky a écrit :
Generally, large requests are an anti-pattern in modern distributed 
systems. Better to have a number of smaller requests executing in 
parallel and then merge the results in the application layer.


-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Saturday, August 9, 2014 7:18 PM
To: solr-user@lucene.apache.org
Subject: How can I request a big list of values ?

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel 
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: How can I request a big list of values ?

2014-08-10 Thread Bruno Mannina


Hi Anshum,

I can do it with 3.6 release no ?

my main problem, it's that I have around 2000 values, so I can't use one 
request with these values, it's too wide. :'(


I will take a look to generate (like Jack proposes me) several requests, 
but even in this case it seems to be not safe...


Le 10/08/2014 19:45, Anshum Gupta a écrit :

Hi Bruno,

If you would have been on a more recent release,
https://issues.apache.org/jira/browse/SOLR-6318 would have come in
handy perhaps.
You might want to look at patching your version with this though (as a
work around).

On Sat, Aug 9, 2014 at 4:18 PM, Bruno Mannina bmann...@free.fr wrote:

Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field (more
than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com






---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

How can I request a big list of values ?

2014-08-09 Thread Bruno Mannina


Hi All,

I'm using actually SOLR 3.6 and I have around 91 000 000 docs inside.

All work fine, it's great :)

But now, I would like to request a list of values in the same field
(more than 2000 values)

I know I can use |?q=x:(AAA BBB CCC ...) (my default operator is OR)

but I have a list of 2000 values ! I think it's not the good idea to use
this method.

Can someone help me to find the good solution ?
Can I use a json structure by using a POST method ?

Thanks a lot,
Bruno
|


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Indexed a new big database while the old is running?

2014-02-19 Thread Bruno Mannina


Hi Shaw,

Thanks for your answer.

Actually we haven't performance problem because we do only select request.
We have 4 CPUs 8cores 24Go Ram.

I know how to create alias, my question was just concerning performance, 
and you have right,
impossible to answer to this question without more information about my 
system, sorry.


I will do real test and I will check if perf will be down, if yes I will 
stop new indexation


If you have more information concerning indexation performance with my 
server config, don't miss to

write me. :)

Have a nice day,

Regards,
Bruno


Le 18/02/2014 16:30, Shawn Heisey a écrit :

On 2/18/2014 5:28 AM, Bruno Mannina wrote:

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be
stopped.

You can instantly switch between collections by using the alias feature.
  To do this, you would have collections named something like test201302
and test201402, then you would create an alias named 'test' that points
to one of these collections.  Your code can use 'test' as the collection
name.

Without a lot more information, it's impossible to say whether building
a new collection will cause performance problems for the existing
collection.

It does seem like a problem that rebuilding the index takes several
days.  You might already be having performance problems.  It's also
possible that there's an aspect to this that I am not seeing, and that
several days is perfectly normal for YOUR index.

Not enough RAM is the most common reason for performance issues on a
large index:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Indexed a new big database while the old is running?

2014-02-18 Thread Bruno Mannina


Dear Solr Users,

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be 
stopped.


Thx for your comment,
Bruno

How to request not directly my SOLR server ?

2013-11-26 Thread Bruno Mannina


Dear All,

I show my SOLR server to a friend and its first question was:

You can request directly your solr database from your internet
explorer?! is it not a security problem?
each person which has your request link can use your database directly?

So I ask the question here. I protect my admin panel but is it possible
to protect a direct request ?

By using google, lot a result concern admin panel security but I can't
find information about that.

Thanks for your comment,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: How to request not directly my SOLR server ?

2013-11-26 Thread Bruno Mannina


Le 26/11/2013 18:52, Shawn Heisey a écrit :

On 11/26/2013 8:37 AM, Bruno Mannina wrote:

I show my SOLR server to a friend and its first question was:

You can request directly your solr database from your internet 
explorer?! is it not a security problem?

each person which has your request link can use your database directly?

So I ask the question here. I protect my admin panel but is it 
possible to protect a direct request ?


Don't make your Solr server directly accessible from the Internet.  
Only make it accessible from the machines that serve your website and 
whoever needs to administer it.


Solr has no security features.  You can use the security features in 
whatever container is running Solr, but that is outside the scope of 
this mailing list.


Thanks,
Shawn




Thanks a lot for this information,

Bruno

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Normalized data during indexing ?

2013-10-25 Thread Bruno Mannina


Dear,

I would like to know if SOLR can do that:

I have a field named Assignee with values like:

Int Business Machines Corp
Int Business Mach Inc

I would like to have a result field in the schema.xml named
Norm_Assignee which contains
the translation with a lexical file:

Int Business Machines Corp  IBM
Int Business Mach Inc  IBM

So, I will have:

doc

arr name=assignee
 strInt Business Machines Corp/str
/arr
arr name=norm_assignee
 strIBM/str
/arr

/doc
doc

arr name=assignee
 strInt Business Mach Inc/str
/arr
arr name=norm_assignee
 strIBM/str
/arr

/doc
and if the correspondance do not exists then don't create the data.

I'm sure this idea is possible with SOLR but I don't found on Wiki,
Google, SOLR Support

Thanks for any idea,

Bruno


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Terms function join with a Select function ?

2013-10-25 Thread Bruno Mannina

Hi Erick,

I think it's a memory problem, I do my test on a little computer at home
(8Go Ram i3-2120 3.30Ghz 64bits)

and my database is very big 87M docs for 200Go size.

I thought SOLR could done statistic on only the query answer, so here on
around 3000 docs (around 6000 terms)

it's not so big

I do analyze log yet, I will do in few hours when I comeback home

Thanks,
Bruno

Le 25/10/2013 15:36, Erick Erickson a écrit :

How many unique values are in the field? Solr has to create a counter
for each and every one of them, you may be blowing memory up. What
do the logs say?

Best,
Erick

On Thu, Oct 24, 2013 at 4:07 PM, Bruno Mannina bmann...@free.fr wrote:

Just a little precision: solr down after running my URL :( so bad...

Le 24/10/2013 22:04, Bruno Mannina a écrit :

humm facet perfs are very bad (Solr 3.6.0)

My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not the
case.

My request:
http://localhost:2727/solr/**select?q=ti:snowboardrows=0**
facet=truefacet.field=ap**facet.limit=5http://localhost:2727/solr/select?q=ti:snowboardrows=0facet=truefacet.field=apfacet.limit=5

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms
function) on a query.

Thx for your help,

Bruno

Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the
field AP (applicant field (patent notice))

but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboard**version=2.2start=0rows=10**
indent=onfacet=truef.ap.**facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

http://wiki.apache.org/solr/**SimpleFacetParametershttp://wiki.apache.org/solr/SimpleFacetParameters

On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote:

Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but
it's for the whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I
have the 10 first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant
parce que la protection avast! Antivirus est active.
http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Normalized data during indexing ?

2013-10-25 Thread Bruno Mannina


Hi Michael,

thanks it sounds like I'm looking for

I need to investigate

Thanks a lot !

Le 25/10/2013 14:46, michael.boom a écrit :

Maybe this can help you:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalized-data-during-indexing-tp4097750p4097752.html
Sent from the Solr - User mailing list archive at Nabble.com.





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's 
for the whole database.


I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have 
the 10 first frequency term.


- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks





---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina


Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the 
field AP (applicant field (patent notice))


but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

 http://wiki.apache.org/solr/SimpleFacetParameters




On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote:


Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but it's for the 
whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I have the 10 
first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks




---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com







---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce 
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not the
case.

My request:
http://localhost:2727/solr/select?q=ti:snowboardrows=0facet=truefacet.field=apfacet.limit=5

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms
function) on a query.

Thx for your help,

Bruno

Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for the
field AP (applicant field (patent notice))

but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

http://wiki.apache.org/solr/SimpleFacetParameters

On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote:

Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but
it's for the whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I
have the 10 first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce
que la protection avast! Antivirus est active.
http://www.avast.com

Re: Terms function join with a Select function ?

2013-10-24 Thread Bruno Mannina

Just a little precision: solr down after running my URL :( so bad...

Le 24/10/2013 22:04, Bruno Mannina a écrit :

humm facet perfs are very bad (Solr 3.6.0)
My index is around 87 000 000 docs. (4 * Proc double core, 24G Ram)

I thought facets will work only on the result but it seems it's not
the case.

My request:
http://localhost:2727/solr/select?q=ti:snowboardrows=0facet=truefacet.field=apfacet.limit=5

Do you think my request is wrong ?

Maybe it's not possible to have statistic on a field (like Terms
function) on a query.

Thx for your help,

Bruno

Le 24/10/2013 19:40, Bruno Mannina a écrit :

Dear,

humI don't know how can I use it..;

I tried:

my query:
ti:snowboard (3095 results)

I would like to have at the end of my XML, the Terms statistic for
the field AP (applicant field (patent notice))

but I haven't that...

Please help,
Bruno

/select?q=ti%Asnowboardversion=2.2start=0rows=10indent=onfacet=truef.ap.facet.limit=10

Le 24/10/2013 14:04, Erik Hatcher a écrit :

That would be called faceting :)

http://wiki.apache.org/solr/SimpleFacetParameters

On Oct 24, 2013, at 5:23 AM, Bruno Mannina bmann...@free.fr wrote:

Dear All,

Ok I have an answer concerning the first question (limit)
It's the terms.limit parameters.

But I can't find how to apply a Terms request on a query result

any idea ?

Bruno

Le 23/10/2013 23:19, Bruno Mannina a écrit :

Dear Solr users,

I use the Terms function to see the frequency data in a field but
it's for the whole database.

I have 2 questions:
- Is it possible to increase the number of statistic ? actually I
have the 10 first frequency term.

- Is it possible to limit this statistic to the result of a request ?

PS: the second question is very important for me.

Many thanks

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel
malveillant parce que la protection avast! Antivirus est active.

http://www.avast.com

---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce
que la protection avast! Antivirus est active.
http://www.avast.com

Is Solr can create temporary sub-index ?

2013-10-23 Thread Bruno Mannina


Dear Solr User,

We have to do a new web project which is : Connect our SOLR database to 
a web plateform.


This Web Plateform will be used by several users at the same time.
They do requests on our SOLR and they can apply filter on the result.

i.e.:
Our SOLR contains 87M docs
An user do requests, result is around few hundreds to several thousands.
On the Web Plateform, user will see first 20 results (or more by using 
Next Page button)
But he will need also to filter the whole result by additional terms. 
(Terms that our plateform will propose him)


Is SOLR can create temporary index (manage by SOLR himself during a web 
session) ?


My goal is to not download the whole result on local computer to provide 
filter, or to re-send

the same request several times added to the new criterias.

Many thanks for your comment,

Regards,
Bruno

1 2 3 >

1 - 100 of 217 matches

Mail list logo