Re: Adding a new shard

2016-04-15 Thread Jay Potharaju
I found ticket https://issues.apache.org/jira/browse/SOLR-5025 which talks
about sharding in solrcloud. Are there any plans to address this issue in
near future?
Can any of the users on the forum comment how they are handling this
scenario in production?
Thanks

On Fri, Apr 15, 2016 at 4:28 PM, Jay Potharaju 
wrote:

> Hi,
> I have an existing collection which has 2 shards, one on each node in the
> cloud. Now I want to split the existing collection into 3 shards because of
> increase in volume of data. And create this new shard  on a new node in the
> solrCloud.
>
>  I read about splitting a shard & creating a shard, but not sure it will
> work.
>
> Any suggestions how are others handling this scenario in production.
> --
> Thanks
> Jay
>
>



-- 
Thanks
Jay Potharaju


dataimport db-data-config.xml

2016-04-15 Thread kishor
I am try to run two pgsql query on same data-source. is this possible in
db-data-config.xml.










   

This code is not working please suggest any more example




--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270673.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimport db-data-config.xml

2016-04-15 Thread kishor
I am try to run two pgsql query on same data-source. is this possible in
db-data-config.xml.










   

This code is not working please suggest any more example




--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5

2016-04-15 Thread Chris Hostetter

I can't explain the results you are seeing, but you also didn't provide us 
with your schema.xml (ie; how are "text" and "text_auto" defined?) or 
enough details ot try and reproduce on a small scale (ie: what does the 
source data look like in the documents where these suggestion values 
are coming from.

If i start up the "bin/solr -e techproducts" example, which is also 
configured to use DocumentDictionaryFactory, I don't see any duplicate 
suggestions...

curl 
'http://localhost:8983/solr/techproducts/suggest?suggest.dictionary=mySuggester=true=true=json'
{"responseHeader":{"status":0,"QTime":13},"command":"build"}
curl 
'http://localhost:8983/solr/techproducts/suggest?wt=json=true=mySuggester=true=elec'
{
  "responseHeader":{
"status":0,
"QTime":1},
  "suggest":{"mySuggester":{
  "elec":{
"numFound":3,
"suggestions":[{
"term":"electronics and computer1",
"weight":2199,
"payload":""},
  {
"term":"electronics",
"weight":649,
"payload":""},
  {
"term":"electronics and stuff2",
"weight":279,
"payload":""}]

...can you provide us with some precises (and ideally minimal) steps to 
reproduce the problem you are describing?


For Example...

1) "Add XYZ to the 5.5 sample_techproducts_configs solrconfig.xml"
2) "Add ABC to the 5.5 sample_techproducts_configs managed-schema"
3) run this curl command to index a few sample documents...
4) run this curl command to see some suggest results that have duplicates 
in them based on the sample data from step #3


?


-Hoss
http://www.lucidworks.com/


Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-15 Thread Chris Hostetter

: At first, I saw the same exception you got ... but after a little while
: I figured out that this is because I was running the program more than
: once without deleting everything in the baseDir -- so the zookeeper
: server was starting with an existing database already containing the
: solr.xml.  When MiniSolrCloudCluster is used in Solr tests, the baseDir
: is newly created for each test class, so this doesn't happen.

Yeah ... this is interesting.  I would definitely suggest that for now you 
*always* start with a clean baseDir.  I've opened an issue to figure out 
wether MiniSolrCloudCluster should fail if you don't, or make it a 
supported usecase...

https://issues.apache.org/jira/browse/SOLR-8999



-Hoss
http://www.lucidworks.com/


Adding a new shard

2016-04-15 Thread Jay Potharaju
Hi,
I have an existing collection which has 2 shards, one on each node in the
cloud. Now I want to split the existing collection into 3 shards because of
increase in volume of data. And create this new shard  on a new node in the
solrCloud.

 I read about splitting a shard & creating a shard, but not sure it will
work.

Any suggestions how are others handling this scenario in production.
-- 
Thanks
Jay


SOLR-3666

2016-04-15 Thread Jay Potharaju
Hi,
I am using solrCloud with DIH for indexing my data. Is it possible to get
status of all my DIH across all nodes in the cloud? I saw this jira ticket
from couple of years ago.
https://issues.apache.org/jira/browse/SOLR-3666

Can any of contributors comment on whether this would be resolved? The only
alternative I know is to get list of all nodes in the cloud and poll each
one of them to check the DIH status. Not the most effective way but will
work.



-- 
Thanks
Jay


Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Joel Bernstein
Ok, I think I know the problem you're running into. You'll need to load the
solr-solrj jar after loading the jars in the solrj-lib. Otherwise DbVis
seems to get confused and lose the driver class. We'll work on putting out
a single jar for the JDBC driver.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 4:23 PM, Joel Bernstein  wrote:

> I just added the driver again to DBVisualizer and it found it for me. You
> can see the attached screen shot.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 15, 2016 at 3:13 PM, Joel Bernstein 
> wrote:
>
>> What version of DbVisualizer are you using?
>>
>> When I tested I was using the latest version.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Fri, Apr 15, 2016 at 12:47 PM, Reth RM  wrote:
>>
>>> output of command :
>>>
>>> org/apache/solr/client/solrj/io/sql/
>>>META-INF/services/java.sql.Driver
>>>   org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
>>>  org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
>>>org/apache/solr/client/solrj/io/sql/DriverImpl.class
>>>  org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
>>>org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
>>>   org/apache/solr/client/solrj/io/sql/StatementImpl.class
>>>org/apache/solr/client/solrj/io/sql/package-info.class
>>>
>>>
>>>
>>> On Fri, Apr 15, 2016 at 9:01 PM, Kevin Risden >> >
>>> wrote:
>>>
>>> > >
>>> > > Page 11, the screenshot specifies to select a
>>> > > "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
>>> > > "solr-solrj-6.0.0.jar" shipped with released version, correct?
>>> > >
>>> >
>>> > Correct the PDF was generated before 6.0.0 was released. The
>>> documentation
>>> > from SOLR-8521 is being migrated to here:
>>> >
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools
>>> >
>>> >
>>> > > When I try adding that jar, it doesn't show up driver class,
>>> DBVisualizer
>>> > > still shows "No new driver class". Does it mean the class is not
>>> added to
>>> > > this jar yet?
>>> > >
>>> >
>>> > I checked the Solr 6.0.0 release and the driver is there. I was
>>> testing it
>>> > yesterday for a blog series that I'm putting together.
>>> >
>>> > Just for reference here is the output for the Solr 6 release:
>>> >
>>> > tar -tvf solr-solrj-6.0.0.jar | grep sql
>>> > drwxrwxrwx  0 0  0   0 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/
>>> > -rwxrwxrwx  0 0  0 842 Apr  1 14:40
>>> > META-INF/services/java.sql.Driver
>>> > -rwxrwxrwx  0 0  0   10124 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
>>> > -rwxrwxrwx  0 0  0   23557 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
>>> > -rwxrwxrwx  0 0  04459 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/DriverImpl.class
>>> > -rwxrwxrwx  0 0  0   28333 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
>>> > -rwxrwxrwx  0 0  05167 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
>>> > -rwxrwxrwx  0 0  0   10451 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/StatementImpl.class
>>> > -rwxrwxrwx  0 0  0 141 Apr  1 14:40
>>> > org/apache/solr/client/solrj/io/sql/package-info.class
>>> >
>>> >
>>> > Kevin Risden
>>> > Apache Lucene/Solr Committer
>>> > Hadoop and Search Tech Lead | Avalon Consulting, LLC
>>> > 
>>> > M: 732 213 8417
>>> > LinkedIn  |
>>> Google+
>>> >  | Twitter
>>> > 
>>> >
>>> >
>>> >
>>> -
>>> > This message (including any attachments) contains confidential
>>> information
>>> > intended for a specific individual and purpose, and is protected by
>>> law. If
>>> > you are not the intended recipient, you should delete this message. Any
>>> > disclosure, copying, or distribution of this message, or the taking of
>>> any
>>> > action based on it, is strictly prohibited.
>>> >
>>>
>>
>>
>


Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-15 Thread Shawn Heisey
On 4/14/2016 8:32 AM, Rohana Rajapakse wrote:
> I have added few dependency jars into my project. There are no compilation 
> errors or ClassNotFound exceptions, but Zookeeper exception " 
> KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for 
> /solr/solr.xml ". My temporary solrHome folder has a solr.xml.  No other 
> files (solrconfig.xml , schema.xml) are provided. Thought it should start 
> solr cloud server with defaults, but it doesn't. There are no other solr or 
> zookeeper servers running on my machine. 

I looked at SolrCloudTestCase to see how MiniSolrCloudCluster should be
used, then I wrote a little program and configured ivy to pull down
solr-test-framework from 6.0.0 (getting ivy to work right was an
adventure!).  Based on what I found in SolrCloudTestCase, this is the
code I wrote last evening:

public class MiniSC
{
static JettyConfig jettyConfig = null;
static MiniSolrCloudCluster msc = null;
static CloudSolrClient client = null;

public static void main(String[] args) throws Exception
{
jettyConfig = JettyConfig.builder().setContext("/solr").build();
msc = new MiniSolrCloudCluster(2, Paths.get("testcluster"),
jettyConfig);
client = msc.getSolrClient();
client.close();
msc.shutdown();
}
}

At first, I saw the same exception you got ... but after a little while
I figured out that this is because I was running the program more than
once without deleting everything in the baseDir -- so the zookeeper
server was starting with an existing database already containing the
solr.xml.  When MiniSolrCloudCluster is used in Solr tests, the baseDir
is newly created for each test class, so this doesn't happen.

When I delete everything in "testcluster" and run my test code, I get
the following in my logfile:

http://apaste.info/Dkw

There are no errors, only WARN and INFO logs.  At this point, I should
be able to use the client object to upload a config to zookeeper, create
a collection, and do other testing.

Thanks,
Shawn



Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Joel Bernstein
What version of DbVisualizer are you using?

When I tested I was using the latest version.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 12:47 PM, Reth RM  wrote:

> output of command :
>
> org/apache/solr/client/solrj/io/sql/
>META-INF/services/java.sql.Driver
>   org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
>  org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
>org/apache/solr/client/solrj/io/sql/DriverImpl.class
>  org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
>org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
>   org/apache/solr/client/solrj/io/sql/StatementImpl.class
>org/apache/solr/client/solrj/io/sql/package-info.class
>
>
>
> On Fri, Apr 15, 2016 at 9:01 PM, Kevin Risden 
> wrote:
>
> > >
> > > Page 11, the screenshot specifies to select a
> > > "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> > > "solr-solrj-6.0.0.jar" shipped with released version, correct?
> > >
> >
> > Correct the PDF was generated before 6.0.0 was released. The
> documentation
> > from SOLR-8521 is being migrated to here:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools
> >
> >
> > > When I try adding that jar, it doesn't show up driver class,
> DBVisualizer
> > > still shows "No new driver class". Does it mean the class is not added
> to
> > > this jar yet?
> > >
> >
> > I checked the Solr 6.0.0 release and the driver is there. I was testing
> it
> > yesterday for a blog series that I'm putting together.
> >
> > Just for reference here is the output for the Solr 6 release:
> >
> > tar -tvf solr-solrj-6.0.0.jar | grep sql
> > drwxrwxrwx  0 0  0   0 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/
> > -rwxrwxrwx  0 0  0 842 Apr  1 14:40
> > META-INF/services/java.sql.Driver
> > -rwxrwxrwx  0 0  0   10124 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
> > -rwxrwxrwx  0 0  0   23557 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
> > -rwxrwxrwx  0 0  04459 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/DriverImpl.class
> > -rwxrwxrwx  0 0  0   28333 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
> > -rwxrwxrwx  0 0  05167 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
> > -rwxrwxrwx  0 0  0   10451 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/StatementImpl.class
> > -rwxrwxrwx  0 0  0 141 Apr  1 14:40
> > org/apache/solr/client/solrj/io/sql/package-info.class
> >
> >
> > Kevin Risden
> > Apache Lucene/Solr Committer
> > Hadoop and Search Tech Lead | Avalon Consulting, LLC
> > 
> > M: 732 213 8417
> > LinkedIn  |
> Google+
> >  | Twitter
> > 
> >
> >
> >
> -
> > This message (including any attachments) contains confidential
> information
> > intended for a specific individual and purpose, and is protected by law.
> If
> > you are not the intended recipient, you should delete this message. Any
> > disclosure, copying, or distribution of this message, or the taking of
> any
> > action based on it, is strictly prohibited.
> >
>


Re: Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5

2016-04-15 Thread Tejas Bhanushali
For more info PFA config file .

URL
http://localhost:8983/solr/products/suggest?suggest=true=true=json=Fruit=none

On Fri, Apr 15, 2016 at 11:18 PM, Tejas Bhanushali <
contact.tejasbhanush...@gmail.com> wrote:

> Hi Team,
>
> Im getting the duplicate result when i do auto suggestion based on
> multiple filed by using copy filed . i have below table configuration .
>
> Segment -- have multiple category -- have multiple sub category -- have
> multiple products.
>
> suggestion are given based on
> segment name, category name, sub category name and product name.
>
> below is output .
>
> ---
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 1375
>   },
>   "command": "build",
>   "suggest": {
> "mySuggester": {
>   "Fruit": {
> "numFound": 10,
> "suggestions": [
>   {
> "term": "Fruits & Vegetables",
> "weight": 1000,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 1000,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 980,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 980,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 800,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 588,
> "payload": ""
>   },
>   {
> "term": "Cut Fruits",
> "weight": 456,
> "payload": ""
>   },
>   {
> "term": "Fruits",
> "weight": 456,
> "payload": ""
>   },
>   {
> "term": "Fruits & Vegetables",
> "weight": 456,
> "payload": ""
>   },
>   {
> "term": "Fruits",
> "weight": 456,
> "payload": ""
>   }
> ]
>   }
> }
>   }
> }
>
> --
> Thanks & Regards,
>
> Tejas Bhanushali
>



-- 
Thanks & Regards,

Tejas Bhanushali

 
 
 https://s3.amazonaws.com/instadelibucket/',imageUrl) as product_imageUrl,description,cost,companyName,sub_catg_id,catg_id,seg_id from product where status = 2 limit 1000"
	  deltaQuery="select product_id from product where last_modified > '${dataimporter.last_index_time}'">

		
	   
	   
	   
	   
	   	

		
		
		
	
	
		
		
		
	
		
		
		
		
	

 





  

  
  5.5.0

  

  
  
  

  
  

  
  

  
  
  
  
  
  
  
  
  

  
  ${solr.data.dir:}


  
  

  
  

  
  
true
managed-schema
  

  
  


1 















${solr.lock.type:native}













  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}




  ${solr.autoCommit.maxTime:15000}
  false





  ${solr.autoSoftCommit.maxTime:-1}







  

  
  

  
  

1024
























true





20


200




  

  


  

  



false


2

  


  
  









  

  
  
	
   
  
  

text_general





  default
  text
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  wordbreak
  solr.WordBreakSolrSpellChecker  
  name
  true
  true
  10










  
  
  
  

  
  default
  wordbreak
  on
  true   
  10
  5
  5   
  true
  true  
  10
  5 


  spellcheck

  
  
  

  mySuggester
  AnalyzingInfixLookupFactory  
  DocumentDictionaryFactory
  text
  product_cost
  text_auto
  false
	  false

  

  

  true
  10
	  mySuggester


  suggest

  

  


  explicit
  10
  








  
  
  
  
		
		db-data-config.xml
		
  
  

  
  

  explicit
  json
  true
	 

  


  


  

  _text_

  

  

  add-unknown-fields-to-the-schema

  

  
  

  true
  ignored_
  _text_

  

  

  

  {!xport}
  xsort
  false



  query

  


  



  
  


  
  

  
  

  explicit
  true

  

  

  

  
  

  
  

  true


  tvComponent

  

  

  
  

  
  

  true
  false


  terms

  


  
  


Getting duplicate output while doing auto suggestion based on multiple filed using copy filed in solr 5.5

2016-04-15 Thread Tejas Bhanushali
Hi Team,

Im getting the duplicate result when i do auto suggestion based on multiple
filed by using copy filed . i have below table configuration .

Segment -- have multiple category -- have multiple sub category -- have
multiple products.

suggestion are given based on
segment name, category name, sub category name and product name.

below is output .

---

{
  "responseHeader": {
"status": 0,
"QTime": 1375
  },
  "command": "build",
  "suggest": {
"mySuggester": {
  "Fruit": {
"numFound": 10,
"suggestions": [
  {
"term": "Fruits & Vegetables",
"weight": 1000,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 1000,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 980,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 980,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 800,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 588,
"payload": ""
  },
  {
"term": "Cut Fruits",
"weight": 456,
"payload": ""
  },
  {
"term": "Fruits",
"weight": 456,
"payload": ""
  },
  {
"term": "Fruits & Vegetables",
"weight": 456,
"payload": ""
  },
  {
"term": "Fruits",
"weight": 456,
"payload": ""
  }
]
  }
}
  }
}

-- 
Thanks & Regards,

Tejas Bhanushali


Re: Adding replica on solr - 5.50

2016-04-15 Thread Jay Potharaju
I have multiple solr instances running in my dev sandbox. When adding a
replica i was passing the host IP instead of 127.0.1.1 which is recorded in
the live nodes section.
Thanks Eric for pointing that out.

Working URL:
http://x.x.x.x:9000/solr/admin/collections?action=ADDREPLICA=test4=shard2=127.0.1.1:9000_solr

Thanks


On Fri, Apr 15, 2016 at 10:19 AM, John Bickerstaff  wrote:

> Oh, and what, if any directories need to exist for the ADDREPLICA command
> to work?
>
> Hopefully nothing past the already existing /var/solr/data created by the
> Solr install script?
>
> On Fri, Apr 15, 2016 at 11:18 AM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > Oh, and what, if any directories need to exist for the ADDREPLICA
> >
> > On Fri, Apr 15, 2016 at 11:09 AM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> Thanks again Eric - I'm going to be trying the ADDREPLICA again today or
> >> Monday.  I much prefer that to hand-edit hackery...
> >>
> >> Thanks also for pointing out that cURL makes it "scriptable"...
> >>
> >> On Fri, Apr 15, 2016 at 10:50 AM, Erick Erickson <
> erickerick...@gmail.com
> >> > wrote:
> >>
> >>> bq: Shouldn't this: =x.x.x.x:9001_solr
> >>> <
> >>>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >>> >
> >>>
> >>> Actually be this?  =x.x.x.x:9001/solr
> >>> <
> >>>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >>> >
> >>>
> >>> (Note the / instead of _ )
> >>>
> >>> Good thing you added the note, 'cause I was having trouble seeing the
> >>> difference.
> >>>
> >>> No. The underscore is correct. The "node" in this case is the name
> >>> registered
> >>> in Zookeeper in the "live nodes" znode, _not_ a URL or whatever...
> >>>
> >>> As to your two methods of moving a shard around. Either one is fine,
> >>> although the first one (copying the directory and "doing the right
> thing"
> >>> to edit core.properties) is a little dicier in that you're doing hand
> >>> edits.
> >>>
> >>> Personally I prefer the ADDREPLICA solution. In fact I've moved
> replicas
> >>> around by ADDREPLICA, wait, DELETEREPLICA...
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Fri, Apr 15, 2016 at 3:10 AM, Jaroslaw Rozanski
> >>>  wrote:
> >>> > Hi,
> >>> >
> >>> > Does the `=...` actually work for you? When attempting similar
> >>> with
> >>> > Solr 5.3.1, despite what documentation said, I had to use
> >>> > `node_name=...`.
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Jarek
> >>> >
> >>> > On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
> >>> >> Another thought - again probably not it, but just in case...
> >>> >>
> >>> >> Shouldn't this: =x.x.x.x:9001_solr
> >>> >> <
> >>>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >>> >
> >>> >>
> >>> >> Actually be this?  =x.x.x.x:9001/solr
> >>> >> <
> >>>
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >>> >
> >>> >>
> >>> >> (Note the / instead of _ )
> >>> >>
> >>> >> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
> >>> >>  >>> >> > wrote:
> >>> >>
> >>> >> > Jay - it's probably too simple, but the error says "not currently
> >>> active"
> >>> >> > which could, of course, mean that although it's up and running,
> >>> it's not
> >>> >> > listening on the port you have in the command line...  Or that the
> >>> port is
> >>> >> > blocked by a firewall or other network problem.
> >>> >> >
> >>> >> > I note that you're using ports different from the default 8983 for
> >>> your
> >>> >> > Solr instances...
> >>> >> >
> >>> >> > You probably checked already, but I thought I'd mention it.
> >>> >> >
> >>> >> >
> >>> >> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
> >>> >> > j...@johnbickerstaff.com> wrote:
> >>> >> >
> >>> >> >> Thanks Eric!
> >>> >> >>
> >>> >> >> I'll look into that immediately - yes, I think that cURL would
> >>> qualify as
> >>> >> >> scriptable for my IT lead.
> >>> >> >>
> >>> >> >> In the end, I found I could do it two ways...
> >>> >> >>
> >>> >> >> Either copy the entire solr data directory over to /var/solr/data
> >>> on the
> >>> >> >> new machine, change the directory name and the entries in the
> >>> >> >> core.properties file, then start the already-installed Solr in
> >>> cloud mode -
> >>> >> >> everything came up roses in the cloud section of the UI - the new
> >>> replica
> >>> >> >> was there as part of the collection, properly named and worked
> >>> fine.
> >>> >> >>
> >>> >> >> Alternatively, I used the command I mentioned earlier and then
> >>> waited as
> >>> >> >> the data was replicated over to the newly-created replica --
> again,
> >>> >> >> everything was roses in the Cloud section of the Admin UI...
> >>> >> >>
> >>> >> >> What might I have messed up in this scenario?  I didn't love the
> >>> hackish
> >>> >> >> feeling 

Re: Adding replica on solr - 5.50

2016-04-15 Thread John Bickerstaff
Oh, and what, if any directories need to exist for the ADDREPLICA command
to work?

Hopefully nothing past the already existing /var/solr/data created by the
Solr install script?

On Fri, Apr 15, 2016 at 11:18 AM, John Bickerstaff  wrote:

> Oh, and what, if any directories need to exist for the ADDREPLICA
>
> On Fri, Apr 15, 2016 at 11:09 AM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
>
>> Thanks again Eric - I'm going to be trying the ADDREPLICA again today or
>> Monday.  I much prefer that to hand-edit hackery...
>>
>> Thanks also for pointing out that cURL makes it "scriptable"...
>>
>> On Fri, Apr 15, 2016 at 10:50 AM, Erick Erickson > > wrote:
>>
>>> bq: Shouldn't this: =x.x.x.x:9001_solr
>>> <
>>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>>> >
>>>
>>> Actually be this?  =x.x.x.x:9001/solr
>>> <
>>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>>> >
>>>
>>> (Note the / instead of _ )
>>>
>>> Good thing you added the note, 'cause I was having trouble seeing the
>>> difference.
>>>
>>> No. The underscore is correct. The "node" in this case is the name
>>> registered
>>> in Zookeeper in the "live nodes" znode, _not_ a URL or whatever...
>>>
>>> As to your two methods of moving a shard around. Either one is fine,
>>> although the first one (copying the directory and "doing the right thing"
>>> to edit core.properties) is a little dicier in that you're doing hand
>>> edits.
>>>
>>> Personally I prefer the ADDREPLICA solution. In fact I've moved replicas
>>> around by ADDREPLICA, wait, DELETEREPLICA...
>>>
>>> Best,
>>> Erick
>>>
>>> On Fri, Apr 15, 2016 at 3:10 AM, Jaroslaw Rozanski
>>>  wrote:
>>> > Hi,
>>> >
>>> > Does the `=...` actually work for you? When attempting similar
>>> with
>>> > Solr 5.3.1, despite what documentation said, I had to use
>>> > `node_name=...`.
>>> >
>>> >
>>> > Thanks,
>>> > Jarek
>>> >
>>> > On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
>>> >> Another thought - again probably not it, but just in case...
>>> >>
>>> >> Shouldn't this: =x.x.x.x:9001_solr
>>> >> <
>>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>>> >
>>> >>
>>> >> Actually be this?  =x.x.x.x:9001/solr
>>> >> <
>>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>>> >
>>> >>
>>> >> (Note the / instead of _ )
>>> >>
>>> >> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
>>> >> >> >> > wrote:
>>> >>
>>> >> > Jay - it's probably too simple, but the error says "not currently
>>> active"
>>> >> > which could, of course, mean that although it's up and running,
>>> it's not
>>> >> > listening on the port you have in the command line...  Or that the
>>> port is
>>> >> > blocked by a firewall or other network problem.
>>> >> >
>>> >> > I note that you're using ports different from the default 8983 for
>>> your
>>> >> > Solr instances...
>>> >> >
>>> >> > You probably checked already, but I thought I'd mention it.
>>> >> >
>>> >> >
>>> >> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
>>> >> > j...@johnbickerstaff.com> wrote:
>>> >> >
>>> >> >> Thanks Eric!
>>> >> >>
>>> >> >> I'll look into that immediately - yes, I think that cURL would
>>> qualify as
>>> >> >> scriptable for my IT lead.
>>> >> >>
>>> >> >> In the end, I found I could do it two ways...
>>> >> >>
>>> >> >> Either copy the entire solr data directory over to /var/solr/data
>>> on the
>>> >> >> new machine, change the directory name and the entries in the
>>> >> >> core.properties file, then start the already-installed Solr in
>>> cloud mode -
>>> >> >> everything came up roses in the cloud section of the UI - the new
>>> replica
>>> >> >> was there as part of the collection, properly named and worked
>>> fine.
>>> >> >>
>>> >> >> Alternatively, I used the command I mentioned earlier and then
>>> waited as
>>> >> >> the data was replicated over to the newly-created replica -- again,
>>> >> >> everything was roses in the Cloud section of the Admin UI...
>>> >> >>
>>> >> >> What might I have messed up in this scenario?  I didn't love the
>>> hackish
>>> >> >> feeling either, but had been unable to find anything like the
>>> addreplica -
>>> >> >> although I did look for a fairly long time - I'm glad to know
>>> about it now.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson <
>>> erickerick...@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >>> bq:  the Solr site about how to add a
>>> >> >>> replica to a Solr cloud.  The Admin UI appears to require that the
>>> >> >>> directories be created anyway
>>> >> >>>
>>> >> >>> No, no, a thousand times NO! You're getting confused,
>>> >> >>> I think, with the difference between _cores_ and _collections_
>>> >> >>> (or replicas in a collection).
>>> >> >>>
>>> >> >>> 

Re: Adding replica on solr - 5.50

2016-04-15 Thread John Bickerstaff
Oh, and what, if any directories need to exist for the ADDREPLICA

On Fri, Apr 15, 2016 at 11:09 AM, John Bickerstaff  wrote:

> Thanks again Eric - I'm going to be trying the ADDREPLICA again today or
> Monday.  I much prefer that to hand-edit hackery...
>
> Thanks also for pointing out that cURL makes it "scriptable"...
>
> On Fri, Apr 15, 2016 at 10:50 AM, Erick Erickson 
> wrote:
>
>> bq: Shouldn't this: =x.x.x.x:9001_solr
>> <
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>> >
>>
>> Actually be this?  =x.x.x.x:9001/solr
>> <
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>> >
>>
>> (Note the / instead of _ )
>>
>> Good thing you added the note, 'cause I was having trouble seeing the
>> difference.
>>
>> No. The underscore is correct. The "node" in this case is the name
>> registered
>> in Zookeeper in the "live nodes" znode, _not_ a URL or whatever...
>>
>> As to your two methods of moving a shard around. Either one is fine,
>> although the first one (copying the directory and "doing the right thing"
>> to edit core.properties) is a little dicier in that you're doing hand
>> edits.
>>
>> Personally I prefer the ADDREPLICA solution. In fact I've moved replicas
>> around by ADDREPLICA, wait, DELETEREPLICA...
>>
>> Best,
>> Erick
>>
>> On Fri, Apr 15, 2016 at 3:10 AM, Jaroslaw Rozanski
>>  wrote:
>> > Hi,
>> >
>> > Does the `=...` actually work for you? When attempting similar with
>> > Solr 5.3.1, despite what documentation said, I had to use
>> > `node_name=...`.
>> >
>> >
>> > Thanks,
>> > Jarek
>> >
>> > On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
>> >> Another thought - again probably not it, but just in case...
>> >>
>> >> Shouldn't this: =x.x.x.x:9001_solr
>> >> <
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>> >
>> >>
>> >> Actually be this?  =x.x.x.x:9001/solr
>> >> <
>> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
>> >
>> >>
>> >> (Note the / instead of _ )
>> >>
>> >> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
>> >> > >> > wrote:
>> >>
>> >> > Jay - it's probably too simple, but the error says "not currently
>> active"
>> >> > which could, of course, mean that although it's up and running, it's
>> not
>> >> > listening on the port you have in the command line...  Or that the
>> port is
>> >> > blocked by a firewall or other network problem.
>> >> >
>> >> > I note that you're using ports different from the default 8983 for
>> your
>> >> > Solr instances...
>> >> >
>> >> > You probably checked already, but I thought I'd mention it.
>> >> >
>> >> >
>> >> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
>> >> > j...@johnbickerstaff.com> wrote:
>> >> >
>> >> >> Thanks Eric!
>> >> >>
>> >> >> I'll look into that immediately - yes, I think that cURL would
>> qualify as
>> >> >> scriptable for my IT lead.
>> >> >>
>> >> >> In the end, I found I could do it two ways...
>> >> >>
>> >> >> Either copy the entire solr data directory over to /var/solr/data
>> on the
>> >> >> new machine, change the directory name and the entries in the
>> >> >> core.properties file, then start the already-installed Solr in
>> cloud mode -
>> >> >> everything came up roses in the cloud section of the UI - the new
>> replica
>> >> >> was there as part of the collection, properly named and worked fine.
>> >> >>
>> >> >> Alternatively, I used the command I mentioned earlier and then
>> waited as
>> >> >> the data was replicated over to the newly-created replica -- again,
>> >> >> everything was roses in the Cloud section of the Admin UI...
>> >> >>
>> >> >> What might I have messed up in this scenario?  I didn't love the
>> hackish
>> >> >> feeling either, but had been unable to find anything like the
>> addreplica -
>> >> >> although I did look for a fairly long time - I'm glad to know about
>> it now.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson <
>> erickerick...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> bq:  the Solr site about how to add a
>> >> >>> replica to a Solr cloud.  The Admin UI appears to require that the
>> >> >>> directories be created anyway
>> >> >>>
>> >> >>> No, no, a thousand times NO! You're getting confused,
>> >> >>> I think, with the difference between _cores_ and _collections_
>> >> >>> (or replicas in a collection).
>> >> >>>
>> >> >>> Do not use the admin UI for _cores_ to create replicas. It's
>> possible
>> >> >>> if (and only if) you do it exactly correctly. Instead, use the
>> >> >>> collections API
>> >> >>> ADDREPLICA command here:
>> >> >>>
>> >> >>>
>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>> >> >>>
>> >> >>> Which you could cURL etc., does that qualify as "scripting" in your

Re: Adding replica on solr - 5.50

2016-04-15 Thread John Bickerstaff
Thanks again Eric - I'm going to be trying the ADDREPLICA again today or
Monday.  I much prefer that to hand-edit hackery...

Thanks also for pointing out that cURL makes it "scriptable"...

On Fri, Apr 15, 2016 at 10:50 AM, Erick Erickson 
wrote:

> bq: Shouldn't this: =x.x.x.x:9001_solr
> <
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >
>
> Actually be this?  =x.x.x.x:9001/solr
> <
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >
>
> (Note the / instead of _ )
>
> Good thing you added the note, 'cause I was having trouble seeing the
> difference.
>
> No. The underscore is correct. The "node" in this case is the name
> registered
> in Zookeeper in the "live nodes" znode, _not_ a URL or whatever...
>
> As to your two methods of moving a shard around. Either one is fine,
> although the first one (copying the directory and "doing the right thing"
> to edit core.properties) is a little dicier in that you're doing hand
> edits.
>
> Personally I prefer the ADDREPLICA solution. In fact I've moved replicas
> around by ADDREPLICA, wait, DELETEREPLICA...
>
> Best,
> Erick
>
> On Fri, Apr 15, 2016 at 3:10 AM, Jaroslaw Rozanski
>  wrote:
> > Hi,
> >
> > Does the `=...` actually work for you? When attempting similar with
> > Solr 5.3.1, despite what documentation said, I had to use
> > `node_name=...`.
> >
> >
> > Thanks,
> > Jarek
> >
> > On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
> >> Another thought - again probably not it, but just in case...
> >>
> >> Shouldn't this: =x.x.x.x:9001_solr
> >> <
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >
> >>
> >> Actually be this?  =x.x.x.x:9001/solr
> >> <
> http://x.x.x.x:8984/solr/admin/collections?action=ADDREPLICA=test2=shard1=x.x.x.x:9001_solr
> >
> >>
> >> (Note the / instead of _ )
> >>
> >> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
> >>  >> > wrote:
> >>
> >> > Jay - it's probably too simple, but the error says "not currently
> active"
> >> > which could, of course, mean that although it's up and running, it's
> not
> >> > listening on the port you have in the command line...  Or that the
> port is
> >> > blocked by a firewall or other network problem.
> >> >
> >> > I note that you're using ports different from the default 8983 for
> your
> >> > Solr instances...
> >> >
> >> > You probably checked already, but I thought I'd mention it.
> >> >
> >> >
> >> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
> >> > j...@johnbickerstaff.com> wrote:
> >> >
> >> >> Thanks Eric!
> >> >>
> >> >> I'll look into that immediately - yes, I think that cURL would
> qualify as
> >> >> scriptable for my IT lead.
> >> >>
> >> >> In the end, I found I could do it two ways...
> >> >>
> >> >> Either copy the entire solr data directory over to /var/solr/data on
> the
> >> >> new machine, change the directory name and the entries in the
> >> >> core.properties file, then start the already-installed Solr in cloud
> mode -
> >> >> everything came up roses in the cloud section of the UI - the new
> replica
> >> >> was there as part of the collection, properly named and worked fine.
> >> >>
> >> >> Alternatively, I used the command I mentioned earlier and then
> waited as
> >> >> the data was replicated over to the newly-created replica -- again,
> >> >> everything was roses in the Cloud section of the Admin UI...
> >> >>
> >> >> What might I have messed up in this scenario?  I didn't love the
> hackish
> >> >> feeling either, but had been unable to find anything like the
> addreplica -
> >> >> although I did look for a fairly long time - I'm glad to know about
> it now.
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> bq:  the Solr site about how to add a
> >> >>> replica to a Solr cloud.  The Admin UI appears to require that the
> >> >>> directories be created anyway
> >> >>>
> >> >>> No, no, a thousand times NO! You're getting confused,
> >> >>> I think, with the difference between _cores_ and _collections_
> >> >>> (or replicas in a collection).
> >> >>>
> >> >>> Do not use the admin UI for _cores_ to create replicas. It's
> possible
> >> >>> if (and only if) you do it exactly correctly. Instead, use the
> >> >>> collections API
> >> >>> ADDREPLICA command here:
> >> >>>
> >> >>>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
> >> >>>
> >> >>> Which you could cURL etc., does that qualify as "scripting" in your
> >> >>> situation?
> >> >>>
> >> >>> You're right, the Solr instance must be up and running for the
> replica to
> >> >>> be added, but that's not onerous
> >> >>>
> >> >>>
> >> >>> The bin/solr script is a "work in progress", and doesn't have direct
> >> >>> support
> >> >>> for "addreplica", 

Re: Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-15 Thread Erick Erickson
The simplest test to see if there are duplicates is to
check the maxDoc and numDocs in the admin UI. If
they're different then you have duplicates. NOTE:
this is not definitive, and you MUST NOT run optimize
before you look. But it's quick. I'd delete all docs
before trying this first though.

Second, you say "there are no errors in any logs". Are
you completely sure that some of the docs didn't have errors?
Just double checking here since there can be a bunch
of logs as they're rolled over. And I'm thinking of the Solr
logs here.

And do note that the UUID field (assuming we're talking
the UUIDUpdateProcessorFactory here) only adds a UUID
if the field is _not_ present in the doc. Even if the field is
empty. The test is something like
if (inputDoc.get(field) == null) {
add the UUID field
}

So even if the doc has the empty string as the UUID field,
no new UUID field will be added...

Best,
Erick

On Thu, Apr 14, 2016 at 11:51 PM, cqlangyi  wrote:
> hi guys,
>
>
> thank you very much for the help. sorry been so lated to reply.
>
>
> 1. "commit" didn't help.
> after commit, the 'numFound' of "*:*" query is still the same.
>
>
> 2. "id" field in every doc is generated by solr using UUID, i have
> idea how to check if there is a duplicated one. but i assuming
> there shouldn't be, unless solr cloud has some known bug when
> using UUID in a distributed environment.
>
>
> the environment is
>
>
> solr cloud with:
> 3 linux boxes, use zookeeper 3.4.6  + solr 5.2.1, oracle JDK 1.7.80
>
>
> any ideas?
>
>
> thank you very much.
>
>
>
>
>
>
> At 2016-04-05 12:09:14, "John Bickerstaff"  wrote:
>>Both of us implied it, but to be completely clear - if you have a duplicate
>>ID in your data set, SOLR will throw away previous documents with that ID
>>and index the new one.  That's fine if your duplicates really are
>>duplicates - it's not OK if there's a problem in the data set and the
>>duplicates ID's are on documents that are actually unique.
>>
>>On Mon, Apr 4, 2016 at 9:51 PM, John Bickerstaff 
>>wrote:
>>
>>> Sweet - that's a good point - I ran into that too - I had not run the
>>> commit for the last "batch" (I was using SolrJ) and so numbers didn't match
>>> until I did.
>>>
>>> On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal 
>>> wrote:
>>>
 1) Are you sure you don't have duplicates?
 2) All of your records might have been indexed but a new searcher may not
 have opened on the updated index yet. Try issuing a commit and see if that
 works.

 On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:

 > hi there,
 >
 >
 > i have an solr 5.2.1,  when i do data import, after the job is done,
 it's
 > shown 165,191 rows processed successfully.
 >
 >
 > but when i query with *:*, the "numFound" shown only 163,349 docs in
 index.
 >
 >
 > when i tred to do it again, , it's shown 165,191 rows processed
 > successfully. but the *:* query result now is 162,390.
 >
 >
 > no errors in any log,
 >
 >
 > any idea?
 >
 >
 > thank you very much!
 >
 >
 > cq
 >
 >
 >
 >
 >
 >
 >
 >
 > At 2016-04-05 09:19:48, "Chris Hostetter" 
 > wrote:
 > >
 > >: I am not sure how to use "Sort By Function" for Case.
 > >:
 > >:
 |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
 > >:
 > >: Can you tell how to fetch 40 when input is 10.
 > >
 > >Something like...
 > >
 >
 >
 >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
 > >
 > >But i suspect there may be a much better way to achieve your ultimate
 goal
 > >if you tell us what it is.  what do these fields represent? what makes
 > >these numeric valuessignificant? do you know which values are
 significant
 > >when indexing, or do they vary for every query?
 > >
 > >https://people.apache.org/~hossman/#xyproblem
 > >XY Problem
 > >
 > >Your question appears to be an "XY Problem" ... that is: you are
 dealing
 > >with "X", you are assuming "Y" will help you, and you are asking about
 "Y"
 > >without giving more details about the "X" so that we can understand the
 > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
 > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
 > >
 > >
 > >
 > >
 > >-Hoss
 > >http://www.lucidworks.com/
 >
 --
 Regards,
 Binoy Dalal

>>>
>>>


Re: Adding replica on solr - 5.50

2016-04-15 Thread Erick Erickson
bq: Shouldn't this: =x.x.x.x:9001_solr


Actually be this?  =x.x.x.x:9001/solr


(Note the / instead of _ )

Good thing you added the note, 'cause I was having trouble seeing the
difference.

No. The underscore is correct. The "node" in this case is the name registered
in Zookeeper in the "live nodes" znode, _not_ a URL or whatever...

As to your two methods of moving a shard around. Either one is fine,
although the first one (copying the directory and "doing the right thing"
to edit core.properties) is a little dicier in that you're doing hand edits.

Personally I prefer the ADDREPLICA solution. In fact I've moved replicas
around by ADDREPLICA, wait, DELETEREPLICA...

Best,
Erick

On Fri, Apr 15, 2016 at 3:10 AM, Jaroslaw Rozanski
 wrote:
> Hi,
>
> Does the `=...` actually work for you? When attempting similar with
> Solr 5.3.1, despite what documentation said, I had to use
> `node_name=...`.
>
>
> Thanks,
> Jarek
>
> On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
>> Another thought - again probably not it, but just in case...
>>
>> Shouldn't this: =x.x.x.x:9001_solr
>> 
>>
>> Actually be this?  =x.x.x.x:9001/solr
>> 
>>
>> (Note the / instead of _ )
>>
>> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
>> > > wrote:
>>
>> > Jay - it's probably too simple, but the error says "not currently active"
>> > which could, of course, mean that although it's up and running, it's not
>> > listening on the port you have in the command line...  Or that the port is
>> > blocked by a firewall or other network problem.
>> >
>> > I note that you're using ports different from the default 8983 for your
>> > Solr instances...
>> >
>> > You probably checked already, but I thought I'd mention it.
>> >
>> >
>> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
>> > j...@johnbickerstaff.com> wrote:
>> >
>> >> Thanks Eric!
>> >>
>> >> I'll look into that immediately - yes, I think that cURL would qualify as
>> >> scriptable for my IT lead.
>> >>
>> >> In the end, I found I could do it two ways...
>> >>
>> >> Either copy the entire solr data directory over to /var/solr/data on the
>> >> new machine, change the directory name and the entries in the
>> >> core.properties file, then start the already-installed Solr in cloud mode 
>> >> -
>> >> everything came up roses in the cloud section of the UI - the new replica
>> >> was there as part of the collection, properly named and worked fine.
>> >>
>> >> Alternatively, I used the command I mentioned earlier and then waited as
>> >> the data was replicated over to the newly-created replica -- again,
>> >> everything was roses in the Cloud section of the Admin UI...
>> >>
>> >> What might I have messed up in this scenario?  I didn't love the hackish
>> >> feeling either, but had been unable to find anything like the addreplica -
>> >> although I did look for a fairly long time - I'm glad to know about it 
>> >> now.
>> >>
>> >>
>> >>
>> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
>> >> wrote:
>> >>
>> >>> bq:  the Solr site about how to add a
>> >>> replica to a Solr cloud.  The Admin UI appears to require that the
>> >>> directories be created anyway
>> >>>
>> >>> No, no, a thousand times NO! You're getting confused,
>> >>> I think, with the difference between _cores_ and _collections_
>> >>> (or replicas in a collection).
>> >>>
>> >>> Do not use the admin UI for _cores_ to create replicas. It's possible
>> >>> if (and only if) you do it exactly correctly. Instead, use the
>> >>> collections API
>> >>> ADDREPLICA command here:
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>> >>>
>> >>> Which you could cURL etc., does that qualify as "scripting" in your
>> >>> situation?
>> >>>
>> >>> You're right, the Solr instance must be up and running for the replica to
>> >>> be added, but that's not onerous
>> >>>
>> >>>
>> >>> The bin/solr script is a "work in progress", and doesn't have direct
>> >>> support
>> >>> for "addreplica", but it could be added.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
>> >>>  wrote:
>> >>> > Sure - couldn't agree more.
>> >>> >
>> >>> > I couldn't find any good documentation on the Solr site about how to
>> >>> add a
>> >>> > replica to a Solr cloud.  The Admin UI appears to require that the
>> >>> > directories be created anyway.
>> >>> >
>> >>> > There is probably a way to do it through the UI, once Solr is
>> >>> installed on
>> >>> > a new machine - and 

Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Reth RM
output of command :

org/apache/solr/client/solrj/io/sql/
   META-INF/services/java.sql.Driver
  org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
 org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
   org/apache/solr/client/solrj/io/sql/DriverImpl.class
 org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
   org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
  org/apache/solr/client/solrj/io/sql/StatementImpl.class
   org/apache/solr/client/solrj/io/sql/package-info.class



On Fri, Apr 15, 2016 at 9:01 PM, Kevin Risden 
wrote:

> >
> > Page 11, the screenshot specifies to select a
> > "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> > "solr-solrj-6.0.0.jar" shipped with released version, correct?
> >
>
> Correct the PDF was generated before 6.0.0 was released. The documentation
> from SOLR-8521 is being migrated to here:
>
>
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools
>
>
> > When I try adding that jar, it doesn't show up driver class, DBVisualizer
> > still shows "No new driver class". Does it mean the class is not added to
> > this jar yet?
> >
>
> I checked the Solr 6.0.0 release and the driver is there. I was testing it
> yesterday for a blog series that I'm putting together.
>
> Just for reference here is the output for the Solr 6 release:
>
> tar -tvf solr-solrj-6.0.0.jar | grep sql
> drwxrwxrwx  0 0  0   0 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/
> -rwxrwxrwx  0 0  0 842 Apr  1 14:40
> META-INF/services/java.sql.Driver
> -rwxrwxrwx  0 0  0   10124 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
> -rwxrwxrwx  0 0  0   23557 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
> -rwxrwxrwx  0 0  04459 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/DriverImpl.class
> -rwxrwxrwx  0 0  0   28333 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
> -rwxrwxrwx  0 0  05167 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
> -rwxrwxrwx  0 0  0   10451 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/StatementImpl.class
> -rwxrwxrwx  0 0  0 141 Apr  1 14:40
> org/apache/solr/client/solrj/io/sql/package-info.class
>
>
> Kevin Risden
> Apache Lucene/Solr Committer
> Hadoop and Search Tech Lead | Avalon Consulting, LLC
> 
> M: 732 213 8417
> LinkedIn  | Google+
>  | Twitter
> 
>
>
> -
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message. Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
>


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-15 Thread Walter Underwood
I looked at the PHP clients a couple of years ago and they didn’t seem to add 
much.

I wrote PHP code to make GET requests to Solr and parse the JSON response. It 
wasn’t much more code than doing it with a client library.

The client libraries don’t really do much for you. They can’t even keep 
connections open or pool them, because PHP doesn’t do that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 15, 2016, at 8:39 AM, Sara Woodmansee  wrote:
> 
> Hi Shawn,
> 
> No clue what PHP client they are using.
> 
> Thanks for the info!
> 
> Sara
> 
>> On Apr 15, 2016, at 10:35 AM, Shawn Heisey  wrote:
>> 
>> On 4/15/2016 8:15 AM, Sara Woodmansee wrote:
>>> When I suggested the developer consider upgrading to v5.5 or 6.0 (from 
>>> v3.6), this was their response.  It’s clear that upgrading is not going to 
>>> happen any time soon.
>>> 
>>> Developer response:  "But to use SOLR 5, there is a need to find a stable 
>>> and reliable php client. And until very recent time there were no release. 
>>> In other case we would have to write PHP client itself.  Then we would have 
>>> to rewrite integration API with a software, because API very likely has 
>>> changed. And then make changes to every single piece of code in backend and 
>>> frontend of our system that is tied up with search functionality in any 
>>> way. “
>>> 
>>> — I would still like to know (from you folks) if the “stable PHP client” 
>>> issue still holds true?  Perhaps that is not an easy question.
>> 
>> There should be PHP clients with Solr4 support.  Those should work well
>> with 5.x.  I don't know enough about 6 to comment on how compatible it
>> would be.
>> 
>> All PHP clients are third-party -- the project didn't write any of
>> them.  Which PHP client are you using now?
>> 
>> Thanks,
>> Shawn
>> 
> 



Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-15 Thread Sara Woodmansee
Hi Shawn,

No clue what PHP client they are using.

Thanks for the info!

Sara

> On Apr 15, 2016, at 10:35 AM, Shawn Heisey  wrote:
> 
> On 4/15/2016 8:15 AM, Sara Woodmansee wrote:
>> When I suggested the developer consider upgrading to v5.5 or 6.0 (from 
>> v3.6), this was their response.  It’s clear that upgrading is not going to 
>> happen any time soon.
>> 
>> Developer response:  "But to use SOLR 5, there is a need to find a stable 
>> and reliable php client. And until very recent time there were no release. 
>> In other case we would have to write PHP client itself.  Then we would have 
>> to rewrite integration API with a software, because API very likely has 
>> changed. And then make changes to every single piece of code in backend and 
>> frontend of our system that is tied up with search functionality in any way. 
>> “
>> 
>> — I would still like to know (from you folks) if the “stable PHP client” 
>> issue still holds true?  Perhaps that is not an easy question.
> 
> There should be PHP clients with Solr4 support.  Those should work well
> with 5.x.  I don't know enough about 6 to comment on how compatible it
> would be.
> 
> All PHP clients are third-party -- the project didn't write any of
> them.  Which PHP client are you using now?
> 
> Thanks,
> Shawn
> 



Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Kevin Risden
>
> Page 11, the screenshot specifies to select a
> "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> "solr-solrj-6.0.0.jar" shipped with released version, correct?
>

Correct the PDF was generated before 6.0.0 was released. The documentation
from SOLR-8521 is being migrated to here:

https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SQLClientsandDatabaseVisualizationTools


> When I try adding that jar, it doesn't show up driver class, DBVisualizer
> still shows "No new driver class". Does it mean the class is not added to
> this jar yet?
>

I checked the Solr 6.0.0 release and the driver is there. I was testing it
yesterday for a blog series that I'm putting together.

Just for reference here is the output for the Solr 6 release:

tar -tvf solr-solrj-6.0.0.jar | grep sql
drwxrwxrwx  0 0  0   0 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/
-rwxrwxrwx  0 0  0 842 Apr  1 14:40
META-INF/services/java.sql.Driver
-rwxrwxrwx  0 0  0   10124 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ConnectionImpl.class
-rwxrwxrwx  0 0  0   23557 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class
-rwxrwxrwx  0 0  04459 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/DriverImpl.class
-rwxrwxrwx  0 0  0   28333 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetImpl.class
-rwxrwxrwx  0 0  05167 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class
-rwxrwxrwx  0 0  0   10451 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/StatementImpl.class
-rwxrwxrwx  0 0  0 141 Apr  1 14:40
org/apache/solr/client/solrj/io/sql/package-info.class


Kevin Risden
Apache Lucene/Solr Committer
Hadoop and Search Tech Lead | Avalon Consulting, LLC

M: 732 213 8417
LinkedIn  | Google+
 | Twitter


-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.


RE: Shard ranges seem incorrect

2016-04-15 Thread Markus Jelsma
Thanks both. I completely missed Shawn's response.


 
 
-Original message-
> From:Chris Hostetter 
> Sent: Thursday 14th April 2016 22:48
> To: solr-user@lucene.apache.org
> Subject: RE: Shard ranges seem incorrect
> 
> 
> : Hi - bumping this issue. Any thoughts to share?
> 
> Shawn's response to your email seemed spot on acurate to me -- is there 
> something about his answer that doesn't match up with what you're seeing? 
> can you clarify/elaborate your concerns?
> 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c570d0a03.5010...@elyograg.org%3E
> 
> 
>  :  
> : -Original message-
> : > From:Markus Jelsma 
> : > Sent: Tuesday 12th April 2016 13:49
> : > To: solr-user 
> : > Subject: Shard ranges seem incorrect
> : > 
> : > Hi - i've just created a 3 shard 3 replica collection on Solr 6.0.0 and 
> we noticed something odd, the hashing ranges don't make sense (full 
> state.json below):
> : > shard1 Range: 8000-d554
> : > shard2 Range: d555-2aa9
> : > shard3 Range: 2aaa-7fff
> : > 
> : > We've also noticed ranges not going from 0 to  for a 5.5 create 
> single shard collection. Another collection created on an older (unknown) 
> release has correct shard ranges. Any idea what's going on?
> : > Thanks,
> : > Markus
> : > 
> : > {"logs":{
> : > "replicationFactor":"3",
> : > "router":{"name":"compositeId"},
> : > "maxShardsPerNode":"9",
> : > "autoAddReplicas":"false",
> : > "shards":{
> : >   "shard1":{
> : > "range":"8000-d554",
> : > "state":"active",
> : > "replicas":{
> : >   "core_node3":{
> : > "core":"logs_shard1_replica3",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"},
> : >   "core_node4":{
> : > "core":"logs_shard1_replica1",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active",
> : > "leader":"true"},
> : >   "core_node8":{
> : > "core":"logs_shard1_replica2",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"}}},
> : >   "shard2":{
> : > "range":"d555-2aa9",
> : > "state":"active",
> : > "replicas":{
> : >   "core_node1":{
> : > "core":"logs_shard2_replica1",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active",
> : > "leader":"true"},
> : >   "core_node2":{
> : > "core":"logs_shard2_replica2",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"},
> : >   "core_node9":{
> : > "core":"logs_shard2_replica3",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"}}},
> : >   "shard3":{
> : > "range":"2aaa-7fff",
> : > "state":"active",
> : > "replicas":{
> : >   "core_node5":{
> : > "core":"logs_shard3_replica1",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active",
> : > "leader":"true"},
> : >   "core_node6":{
> : > "core":"logs_shard3_replica2",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"},
> : >   "core_node7":{
> : > "core":"logs_shard3_replica3",
> : > "base_url":"http://127.0.1.1:8983/solr;,
> : > "node_name":"127.0.1.1:8983_solr",
> : > "state":"active"}}
> : > 
> : > 
> : > 
> : > 
> : > 
> : 
> 
> -Hoss
> http://www.lucidworks.com/
> 


Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
And it may also be that there are whole classes of user for whom
denormalization is just too heavy a cross to bear and for who a little
extra money spent on more hardware is a great tradeoff.

And... Lucene's indexing may be superior to your average SQL database, so
that a Solr JOIN could be so much better than your average RDBMS SQL JOIN.
That would be an interesting benchmark.

-- Jack Krupansky

On Fri, Apr 15, 2016 at 11:06 AM, Joel Bernstein  wrote:

> I think people are going to be surprised though by the speed of the joins.
> The joins also get faster as the number of shards, replicas and worker
> nodes grow in the cluster. So we may see people building out large clusters
> and and using the joins in OLTP scenarios.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 15, 2016 at 10:58 AM, Jack Krupansky  >
> wrote:
>
> > And of course it depends on the specific queries, both in terms of what
> > fields will be searched and which fields need to be returned.
> >
> > Yes, OLAP is the clear sweet spot, where taking 500 ms to 2 or even 20
> > seconds for a complex query may be just fine vs. OLTP/search where under
> > 150 ms is the target. But, again, it will depend on the nature of the
> > query, the cardinality of each search field, the cross product of
> > cardinality of search fields, etc.
> >
> > -- Jack Krupansky
> >
> > On Fri, Apr 15, 2016 at 10:44 AM, Joel Bernstein 
> > wrote:
> >
> > > In general the Streaming Expression joins are designed for interactive
> > OLAP
> > > type work loads. So BI and data warehousing scenarios are the sweet
> spot.
> > > There may be scenarios where high QPS search applications will work
> with
> > > the distributed joins, particularly if the joins themselves are not
> huge.
> > > But the specific use cases need to be tested.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Fri, Apr 15, 2016 at 10:24 AM, Jack Krupansky <
> > jack.krupan...@gmail.com
> > > >
> > > wrote:
> > >
> > > > It will be interesting to see which use cases work best with the new
> > > > streaming JOIN vs. which will remain best with full denormalization,
> or
> > > > whether you simply have to try both and benchmark them.
> > > >
> > > > My impression had been that streaming JOIN would be ideal for bulk
> > > > operations rather than traditional-style search queries. Maybe there
> > are
> > > > three use cases: bulk read based on broad criteria, top-n relevance
> > > search
> > > > query, and specific document (or small number of documents) based on
> > > > multiple fields.
> > > >
> > > > My suspicion is that doing JOIN on five tables will likely be slower
> > than
> > > > accessing a single document of a denormalized table/index.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein 
> > > > wrote:
> > > >
> > > > > Solr now has full distributed join capabilities as part of the
> > > Streaming
> > > > > Expression library. Keep in mind that these are distributed joins
> so
> > > they
> > > > > shuffle records to worker nodes to perform the joins. These are
> > > > comparable
> > > > > to joins done by SQL over MapReduce systems, but they are very
> > > responsive
> > > > > and can respond with sub-second response time for fairly large
> joins
> > in
> > > > > parallel mode. But these joins do lend themselves to large
> > distributed
> > > > > architectures (lot's of shards an replicas). Target QPS also needs
> to
> > > be
> > > > > taken into account and tested in deciding whether these joins will
> > meet
> > > > the
> > > > > specific use case.
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove 
> > wrote:
> > > > >
> > > > > > The Streaming API with Streaming Expressions (or Parallel SQL if
> > you
> > > > want
> > > > > > to use SQL) can give you the functionality you're looking for.
> See
> > > > > >
> > > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> > > > > > and
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> > > > > > SQL queries coming in through the Parallel SQL Interface are
> > > translated
> > > > > > down into Streaming Expressions - if you need to do something
> that
> > > SQL
> > > > > > doesn't yet support you should check out the Streaming
> Expressions
> > to
> > > > see
> > > > > > if it can support it.
> > > > > >
> > > > > > With these you could store your data in separate collections (or
> > the
> > > > same
> > > > > > collection with different docType field values) and then during
> > > search
> > > > > > perform a join (inner, outer, hash) across the collections. You
> > > could,
> > > > if
> > > > > > you wanted, even join with data NOT in solr using the jdbc
> > streaming
> > > > > > function.
> > > > > >
> 

Re: Anticipated Solr 5.5.1 release date

2016-04-15 Thread Tom Evans
Awesome, thanks :)

On Fri, Apr 15, 2016 at 4:19 PM, Anshum Gupta  wrote:
> Hi Tom,
>
> I plan on getting a release candidate out for vote by Monday. If all goes
> well, it'd be about a week from then for the official release.
>
> On Fri, Apr 15, 2016 at 6:52 AM, Tom Evans  wrote:
>
>> Hi all
>>
>> We're currently using Solr 5.5.0 and converting our regular old style
>> facets into JSON facets, and are running in to SOLR-8155 and
>> SOLR-8835. I can see these have already been back-ported to 5.5.x
>> branch, does anyone know when 5.5.1 may be released?
>>
>> We don't particularly want to move to Solr 6, as we have only just
>> finished validating 5.5.0 with our original queries!
>>
>> Cheers
>>
>> Tom
>>
>
>
>
> --
> Anshum Gupta


Re: Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Joel Bernstein
Can you post the output from the command below. Notice the diver classes in
the trunk snapshot on my desktop.

jar -tvf solr-solrj-7.0.0-SNAPSHOT.jar | grep sql

 0 Sun Apr 03 20:20:28 EDT 2016 org/apache/solr/client/solrj/io/sql/

   842 Sun Apr 03 20:20:28 EDT 2016 META-INF/services/java.sql.Driver

 10124 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/ConnectionImpl.class

 23557 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/DatabaseMetaDataImpl.class

  4459 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/DriverImpl.class

 28333 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/ResultSetImpl.class

  5167 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/ResultSetMetaDataImpl.class

 10451 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/StatementImpl.class

   141 Sun Apr 03 20:20:28 EDT 2016
org/apache/solr/client/solrj/io/sql/package-info.class

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 10:15 AM, Reth RM  wrote:

> Note: I followed the steps mentioned in the pdf attached on this Jira
> https://issues.apache.org/jira/browse/SOLR-8521
>
> Page 11, the screenshot specifies to select a
> "solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
> "solr-solrj-6.0.0.jar" shipped with released version, correct?
>
> When I try adding that jar, it doesn't show up driver class, DBVisualizer
> still shows "No new driver class". Does it mean the class is not added to
> this jar yet?
>


Re: Anticipated Solr 5.5.1 release date

2016-04-15 Thread Anshum Gupta
Hi Tom,

I plan on getting a release candidate out for vote by Monday. If all goes
well, it'd be about a week from then for the official release.

On Fri, Apr 15, 2016 at 6:52 AM, Tom Evans  wrote:

> Hi all
>
> We're currently using Solr 5.5.0 and converting our regular old style
> facets into JSON facets, and are running in to SOLR-8155 and
> SOLR-8835. I can see these have already been back-ported to 5.5.x
> branch, does anyone know when 5.5.1 may be released?
>
> We don't particularly want to move to Solr 6, as we have only just
> finished validating 5.5.0 with our original queries!
>
> Cheers
>
> Tom
>



-- 
Anshum Gupta


Re: Can a field be an array of fields?

2016-04-15 Thread Jack Krupansky
It all depends on what your queries look like - what input data does your
application have and what data does it need to retrieve.

My recommendation is that you store first name and last name as separate,
multivalued fields if you indeed need to query by precisely a first or last
name, but also store the full name as a separate multivalued text field. If
you want to search by only first or last name, fine. If you want to search
by full name or wildcards, etc., you can use the full name field, using
phrase query. You can use an update request processor to combine first and
last name into that third field. You could also store the full name in a
fourth field as raw JSON if you really need structure in the result. The
third field might have first and last name with a special separator such as
"|", although a simple comma is typically sufficient.


-- Jack Krupansky

On Fri, Apr 15, 2016 at 10:58 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Short answer - JOINs, external query outside Solr, Elastic Search ;)
> Alternatives:
>   * You get back an id for each document when you query on "Nino".   You
> look up the last names in some other system that has the full list.
>   * You index the authors in another collection and use JOINs
>   * You store the author_array as formatted, escaped JSON, stored, but not
> indexed (or analyzed).   When you get the data back, you navigate the JSON
> to the author_array, get the value, and parse that value as JSON.   Now you
> have the full list.
>   * This is a sweet spot for Elastic Search, to be perfectly honest.
>
> -Original Message-
> From: Bastien Latard - MDPI AG [mailto:lat...@mdpi.com.INVALID]
> Sent: Friday, April 15, 2016 7:52 AM
> To: solr-user@lucene.apache.org
> Subject: Can a field be an array of fields?
>
> Hi everybody!
>
> /I described a bit what I found in another thread, but I prefer to create
> a new thread for this specific question.../ *It's **possible to create an
> array of string by doing (incomplete example):
> - in the data-conf.xml:*
> 
>
> 
>   
>   
>   
>   
> 
>
> 
>
> *- in schema.xml:
> * required="false" multiValued="true" />
>  required="false" multiValued="true" />
>  required="false" multiValued="true" />
>  required="false" multiValued="true" />
>
> And this provides something like:
>
> "docs":[
>{
> [...]
> "given_name":["Bastien",  "Matthieu",  "Nino"],
> "last_name":["lastname1", "lastname2",
>  "lastname3",   "lastname4"],
>
> [...]
>
>
> *Note: there can be one author with only a last_name, and then we are
> unable to tell which one it is...*
>
> My goal would be to get this as a result:
>
> "docs":[
>{
> [...]
> "authors_array":
>  [
> [
> "given_name":["Bastien"],
> "last_name":["lastname1"]
>  ],
> [
> "last_name":["lastname2"]
>  ],
> [
> "given_name":["Matthieu"],
> "last_name":["lastname2"]
>  ],
> [
> "given_name":["Nino"],
> "last_name":["lastname4"]
>  ],
>  ]
> [...]
>
>
> Is there any way to do this?
> /PS: I know that I could do '//select if(a.given_name is not null,
> a.given_name ,'') as given_name, [...]//' but I would like to get an
> array.../
>
> I tried to add something like that to the schema.xml, but this doesn't
> work (well, it might be of type 'array'):
>  required="false" multiValued="true"/>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail:
> lat...@mdpi.com
> http://www.mdpi.com/
>
>


Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
I think people are going to be surprised though by the speed of the joins.
The joins also get faster as the number of shards, replicas and worker
nodes grow in the cluster. So we may see people building out large clusters
and and using the joins in OLTP scenarios.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 10:58 AM, Jack Krupansky 
wrote:

> And of course it depends on the specific queries, both in terms of what
> fields will be searched and which fields need to be returned.
>
> Yes, OLAP is the clear sweet spot, where taking 500 ms to 2 or even 20
> seconds for a complex query may be just fine vs. OLTP/search where under
> 150 ms is the target. But, again, it will depend on the nature of the
> query, the cardinality of each search field, the cross product of
> cardinality of search fields, etc.
>
> -- Jack Krupansky
>
> On Fri, Apr 15, 2016 at 10:44 AM, Joel Bernstein 
> wrote:
>
> > In general the Streaming Expression joins are designed for interactive
> OLAP
> > type work loads. So BI and data warehousing scenarios are the sweet spot.
> > There may be scenarios where high QPS search applications will work with
> > the distributed joins, particularly if the joins themselves are not huge.
> > But the specific use cases need to be tested.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Apr 15, 2016 at 10:24 AM, Jack Krupansky <
> jack.krupan...@gmail.com
> > >
> > wrote:
> >
> > > It will be interesting to see which use cases work best with the new
> > > streaming JOIN vs. which will remain best with full denormalization, or
> > > whether you simply have to try both and benchmark them.
> > >
> > > My impression had been that streaming JOIN would be ideal for bulk
> > > operations rather than traditional-style search queries. Maybe there
> are
> > > three use cases: bulk read based on broad criteria, top-n relevance
> > search
> > > query, and specific document (or small number of documents) based on
> > > multiple fields.
> > >
> > > My suspicion is that doing JOIN on five tables will likely be slower
> than
> > > accessing a single document of a denormalized table/index.
> > >
> > > -- Jack Krupansky
> > >
> > > On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein 
> > > wrote:
> > >
> > > > Solr now has full distributed join capabilities as part of the
> > Streaming
> > > > Expression library. Keep in mind that these are distributed joins so
> > they
> > > > shuffle records to worker nodes to perform the joins. These are
> > > comparable
> > > > to joins done by SQL over MapReduce systems, but they are very
> > responsive
> > > > and can respond with sub-second response time for fairly large joins
> in
> > > > parallel mode. But these joins do lend themselves to large
> distributed
> > > > architectures (lot's of shards an replicas). Target QPS also needs to
> > be
> > > > taken into account and tested in deciding whether these joins will
> meet
> > > the
> > > > specific use case.
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove 
> wrote:
> > > >
> > > > > The Streaming API with Streaming Expressions (or Parallel SQL if
> you
> > > want
> > > > > to use SQL) can give you the functionality you're looking for. See
> > > > >
> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> > > > > and
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> > > > > SQL queries coming in through the Parallel SQL Interface are
> > translated
> > > > > down into Streaming Expressions - if you need to do something that
> > SQL
> > > > > doesn't yet support you should check out the Streaming Expressions
> to
> > > see
> > > > > if it can support it.
> > > > >
> > > > > With these you could store your data in separate collections (or
> the
> > > same
> > > > > collection with different docType field values) and then during
> > search
> > > > > perform a join (inner, outer, hash) across the collections. You
> > could,
> > > if
> > > > > you wanted, even join with data NOT in solr using the jdbc
> streaming
> > > > > function.
> > > > >
> > > > > - Dennis Gove
> > > > >
> > > > >
> > > > > On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
> > > > > lat...@mdpi.com.invalid> wrote:
> > > > >
> > > > >> '*would I then be able to query a specific field of articles or
> > other
> > > > >> "table" (with the same OR BETTER performances)?*'
> > > > >> -> And especially, would I be able to get only 1 article in the
> > > > result...
> > > > >>
> > > > >> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
> > > > >>
> > > > >> Thanks Jack.
> > > > >>
> > > > >> I know that Solr is a search engine, but this replace a search in
> my
> > > > >> mysql DB with this model:
> > > > >>
> > > > >>
> > > > >> *My goal is to improve my environment (and my 

RE: Can a field be an array of fields?

2016-04-15 Thread Davis, Daniel (NIH/NLM) [C]
Short answer - JOINs, external query outside Solr, Elastic Search ;)
Alternatives:
  * You get back an id for each document when you query on "Nino".   You look 
up the last names in some other system that has the full list.
  * You index the authors in another collection and use JOINs
  * You store the author_array as formatted, escaped JSON, stored, but not 
indexed (or analyzed).   When you get the data back, you navigate the JSON to 
the author_array, get the value, and parse that value as JSON.   Now you have 
the full list.
  * This is a sweet spot for Elastic Search, to be perfectly honest.

-Original Message-
From: Bastien Latard - MDPI AG [mailto:lat...@mdpi.com.INVALID] 
Sent: Friday, April 15, 2016 7:52 AM
To: solr-user@lucene.apache.org
Subject: Can a field be an array of fields?

Hi everybody!

/I described a bit what I found in another thread, but I prefer to create a new 
thread for this specific question.../ *It's **possible to create an array of 
string by doing (incomplete example):
- in the data-conf.xml:*



  
  
  
  




*- in schema.xml:
*




And this provides something like:

"docs":[
   {
[...]
"given_name":["Bastien",  "Matthieu",  "Nino"],
"last_name":["lastname1", "lastname2", "lastname3", 
  "lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are unable to 
tell which one it is...*

My goal would be to get this as a result:

"docs":[
   {
[...]
"authors_array":
 [  
[
"given_name":["Bastien"],
"last_name":["lastname1"]
 ],
[
"last_name":["lastname2"]
 ],
[
"given_name":["Matthieu"],
"last_name":["lastname2"]
 ],
[
"given_name":["Nino"],
"last_name":["lastname4"]
 ],
 ]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, a.given_name 
,'') as given_name, [...]//' but I would like to get an array.../

I tried to add something like that to the schema.xml, but this doesn't work 
(well, it might be of type 'array'):


Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re: Adding docValues in schema - Solr Cloud 4.8.1

2016-04-15 Thread Vincenzo D'Amore
Thanks Shawn,

just to confirm your claim.
Following your suggestion I have double checked my queries with grouping
and faceting.
Faceting and grouping became empty immediately after I have added docValues
.

Thanks again for your support,
Vincenzo

On Fri, Apr 15, 2016 at 4:31 PM, Shawn Heisey  wrote:

> On 4/15/2016 7:42 AM, Vincenzo D'Amore wrote:
> > I would like to add docValues to few fields definition in production.
> >
> > I first tried in a test environment during partial reindexing and it
> seems
> > have no effect, (i.e. no real benefits with small number of documents to
> > reindex, 30% of total).
>
> Adding docValues requires a full reindex.  If your index meets the
> criteria for Atomic Updates, then you could do an atomic update on every
> document, but either way, you're going to be indexing every document again.
>
> The problem with not reindexing is that certain Solr features will
> switch to use docValues if the schema says the field has them ... but
> until you reindex, your existing documents will not actually contain
> docValues, so those features will not work on the majority of your
> index.  Those features will *not* fall back to indexed data if the
> schema says docValues="true".
>
> The list of features that won't work right without a reindex is the list
> of features that benefit from docValues (sorting, faceting, grouping),
> so usually there's no reason to add docValues unless you're using at
> least one of those features.
>
> Thanks,
> Shawn
>
>


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
And of course it depends on the specific queries, both in terms of what
fields will be searched and which fields need to be returned.

Yes, OLAP is the clear sweet spot, where taking 500 ms to 2 or even 20
seconds for a complex query may be just fine vs. OLTP/search where under
150 ms is the target. But, again, it will depend on the nature of the
query, the cardinality of each search field, the cross product of
cardinality of search fields, etc.

-- Jack Krupansky

On Fri, Apr 15, 2016 at 10:44 AM, Joel Bernstein  wrote:

> In general the Streaming Expression joins are designed for interactive OLAP
> type work loads. So BI and data warehousing scenarios are the sweet spot.
> There may be scenarios where high QPS search applications will work with
> the distributed joins, particularly if the joins themselves are not huge.
> But the specific use cases need to be tested.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 15, 2016 at 10:24 AM, Jack Krupansky  >
> wrote:
>
> > It will be interesting to see which use cases work best with the new
> > streaming JOIN vs. which will remain best with full denormalization, or
> > whether you simply have to try both and benchmark them.
> >
> > My impression had been that streaming JOIN would be ideal for bulk
> > operations rather than traditional-style search queries. Maybe there are
> > three use cases: bulk read based on broad criteria, top-n relevance
> search
> > query, and specific document (or small number of documents) based on
> > multiple fields.
> >
> > My suspicion is that doing JOIN on five tables will likely be slower than
> > accessing a single document of a denormalized table/index.
> >
> > -- Jack Krupansky
> >
> > On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein 
> > wrote:
> >
> > > Solr now has full distributed join capabilities as part of the
> Streaming
> > > Expression library. Keep in mind that these are distributed joins so
> they
> > > shuffle records to worker nodes to perform the joins. These are
> > comparable
> > > to joins done by SQL over MapReduce systems, but they are very
> responsive
> > > and can respond with sub-second response time for fairly large joins in
> > > parallel mode. But these joins do lend themselves to large distributed
> > > architectures (lot's of shards an replicas). Target QPS also needs to
> be
> > > taken into account and tested in deciding whether these joins will meet
> > the
> > > specific use case.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove  wrote:
> > >
> > > > The Streaming API with Streaming Expressions (or Parallel SQL if you
> > want
> > > > to use SQL) can give you the functionality you're looking for. See
> > > >
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> > > > and
> > > >
> > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> > > > SQL queries coming in through the Parallel SQL Interface are
> translated
> > > > down into Streaming Expressions - if you need to do something that
> SQL
> > > > doesn't yet support you should check out the Streaming Expressions to
> > see
> > > > if it can support it.
> > > >
> > > > With these you could store your data in separate collections (or the
> > same
> > > > collection with different docType field values) and then during
> search
> > > > perform a join (inner, outer, hash) across the collections. You
> could,
> > if
> > > > you wanted, even join with data NOT in solr using the jdbc streaming
> > > > function.
> > > >
> > > > - Dennis Gove
> > > >
> > > >
> > > > On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
> > > > lat...@mdpi.com.invalid> wrote:
> > > >
> > > >> '*would I then be able to query a specific field of articles or
> other
> > > >> "table" (with the same OR BETTER performances)?*'
> > > >> -> And especially, would I be able to get only 1 article in the
> > > result...
> > > >>
> > > >> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
> > > >>
> > > >> Thanks Jack.
> > > >>
> > > >> I know that Solr is a search engine, but this replace a search in my
> > > >> mysql DB with this model:
> > > >>
> > > >>
> > > >> *My goal is to improve my environment (and my performances at the
> same
> > > >> time).*
> > > >>
> > > >> *Yes, I have a Solr data model... but atm I created 4 different
> > indexes
> > > >> for "similar service usage".*
> > > >> *So atm, for 70 millions of documents, I am duplicating journal data
> > and
> > > >> publisher data all the time in 1 index (for all articles from the
> same
> > > >> journal/pub) in order to be able to retrieve all data in 1 query...*
> > > >>
> > > >> *I found yesterday that there is the possibility to create like an
> > array
> > > >> of  in the data-conf.xml.*
> > > >> e.g. (pseudo code - incomplete):
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >>
> > > >>
> > > 

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
In general the Streaming Expression joins are designed for interactive OLAP
type work loads. So BI and data warehousing scenarios are the sweet spot.
There may be scenarios where high QPS search applications will work with
the distributed joins, particularly if the joins themselves are not huge.
But the specific use cases need to be tested.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 10:24 AM, Jack Krupansky 
wrote:

> It will be interesting to see which use cases work best with the new
> streaming JOIN vs. which will remain best with full denormalization, or
> whether you simply have to try both and benchmark them.
>
> My impression had been that streaming JOIN would be ideal for bulk
> operations rather than traditional-style search queries. Maybe there are
> three use cases: bulk read based on broad criteria, top-n relevance search
> query, and specific document (or small number of documents) based on
> multiple fields.
>
> My suspicion is that doing JOIN on five tables will likely be slower than
> accessing a single document of a denormalized table/index.
>
> -- Jack Krupansky
>
> On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein 
> wrote:
>
> > Solr now has full distributed join capabilities as part of the Streaming
> > Expression library. Keep in mind that these are distributed joins so they
> > shuffle records to worker nodes to perform the joins. These are
> comparable
> > to joins done by SQL over MapReduce systems, but they are very responsive
> > and can respond with sub-second response time for fairly large joins in
> > parallel mode. But these joins do lend themselves to large distributed
> > architectures (lot's of shards an replicas). Target QPS also needs to be
> > taken into account and tested in deciding whether these joins will meet
> the
> > specific use case.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove  wrote:
> >
> > > The Streaming API with Streaming Expressions (or Parallel SQL if you
> want
> > > to use SQL) can give you the functionality you're looking for. See
> > > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> > > and
> > >
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> > > SQL queries coming in through the Parallel SQL Interface are translated
> > > down into Streaming Expressions - if you need to do something that SQL
> > > doesn't yet support you should check out the Streaming Expressions to
> see
> > > if it can support it.
> > >
> > > With these you could store your data in separate collections (or the
> same
> > > collection with different docType field values) and then during search
> > > perform a join (inner, outer, hash) across the collections. You could,
> if
> > > you wanted, even join with data NOT in solr using the jdbc streaming
> > > function.
> > >
> > > - Dennis Gove
> > >
> > >
> > > On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
> > > lat...@mdpi.com.invalid> wrote:
> > >
> > >> '*would I then be able to query a specific field of articles or other
> > >> "table" (with the same OR BETTER performances)?*'
> > >> -> And especially, would I be able to get only 1 article in the
> > result...
> > >>
> > >> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
> > >>
> > >> Thanks Jack.
> > >>
> > >> I know that Solr is a search engine, but this replace a search in my
> > >> mysql DB with this model:
> > >>
> > >>
> > >> *My goal is to improve my environment (and my performances at the same
> > >> time).*
> > >>
> > >> *Yes, I have a Solr data model... but atm I created 4 different
> indexes
> > >> for "similar service usage".*
> > >> *So atm, for 70 millions of documents, I am duplicating journal data
> and
> > >> publisher data all the time in 1 index (for all articles from the same
> > >> journal/pub) in order to be able to retrieve all data in 1 query...*
> > >>
> > >> *I found yesterday that there is the possibility to create like an
> array
> > >> of  in the data-conf.xml.*
> > >> e.g. (pseudo code - incomplete):
> > >> 
> > >> 
> > >> 
> > >> 
> > >>
> > >>
> > >> * Would this be a good option? Is this the denormalization you were
> > >> proposing? *
> > >>
> > >> *If yes, would I then be able to query a specific field of articles or
> > >> other "table" (with the same OR BETTER performances)? If yes, I might
> > >> probably merge all the different indexes together. *
> > >> *I'm currently joining everything in mysql, so duplicating the fields
> in
> > >> the solr (pseudo code):*
> > >> 
> > >> *So I have an index for authors query, a general one for articles
> (only
> > >> needed info of other tables) ...*
> > >>
> > >> Thanks in advance for the tips. :)
> > >>
> > >> Kind regards,
> > >> Bastien
> > >>
> > >> On 14/04/2016 16:23, Jack Krupansky wrote:
> > >>
> > >> Solr is a search engine, not a database.
> > >>
> > >> JOINs? Although 

Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-15 Thread Shawn Heisey
On 4/15/2016 8:15 AM, Sara Woodmansee wrote:
> When I suggested the developer consider upgrading to v5.5 or 6.0 (from v3.6), 
> this was their response.  It’s clear that upgrading is not going to happen 
> any time soon.
>
> Developer response:  "But to use SOLR 5, there is a need to find a stable and 
> reliable php client. And until very recent time there were no release. In 
> other case we would have to write PHP client itself.  Then we would have to 
> rewrite integration API with a software, because API very likely has changed. 
> And then make changes to every single piece of code in backend and frontend 
> of our system that is tied up with search functionality in any way. “
>
> — I would still like to know (from you folks) if the “stable PHP client” 
> issue still holds true?  Perhaps that is not an easy question.

There should be PHP clients with Solr4 support.  Those should work well
with 5.x.  I don't know enough about 6 to comment on how compatible it
would be.

All PHP clients are third-party -- the project didn't write any of
them.  Which PHP client are you using now?

Thanks,
Shawn



Re: Adding docValues in schema - Solr Cloud 4.8.1

2016-04-15 Thread Shawn Heisey
On 4/15/2016 7:42 AM, Vincenzo D'Amore wrote:
> I would like to add docValues to few fields definition in production.
>
> I first tried in a test environment during partial reindexing and it seems
> have no effect, (i.e. no real benefits with small number of documents to
> reindex, 30% of total).

Adding docValues requires a full reindex.  If your index meets the
criteria for Atomic Updates, then you could do an atomic update on every
document, but either way, you're going to be indexing every document again.

The problem with not reindexing is that certain Solr features will
switch to use docValues if the schema says the field has them ... but
until you reindex, your existing documents will not actually contain
docValues, so those features will not work on the majority of your
index.  Those features will *not* fall back to indexed data if the
schema says docValues="true".

The list of features that won't work right without a reindex is the list
of features that benefit from docValues (sorting, faceting, grouping),
so usually there's no reason to add docValues unless you're using at
least one of those features.

Thanks,
Shawn



Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
It will be interesting to see which use cases work best with the new
streaming JOIN vs. which will remain best with full denormalization, or
whether you simply have to try both and benchmark them.

My impression had been that streaming JOIN would be ideal for bulk
operations rather than traditional-style search queries. Maybe there are
three use cases: bulk read based on broad criteria, top-n relevance search
query, and specific document (or small number of documents) based on
multiple fields.

My suspicion is that doing JOIN on five tables will likely be slower than
accessing a single document of a denormalized table/index.

-- Jack Krupansky

On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein  wrote:

> Solr now has full distributed join capabilities as part of the Streaming
> Expression library. Keep in mind that these are distributed joins so they
> shuffle records to worker nodes to perform the joins. These are comparable
> to joins done by SQL over MapReduce systems, but they are very responsive
> and can respond with sub-second response time for fairly large joins in
> parallel mode. But these joins do lend themselves to large distributed
> architectures (lot's of shards an replicas). Target QPS also needs to be
> taken into account and tested in deciding whether these joins will meet the
> specific use case.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove  wrote:
>
> > The Streaming API with Streaming Expressions (or Parallel SQL if you want
> > to use SQL) can give you the functionality you're looking for. See
> > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> > and
> > https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> > SQL queries coming in through the Parallel SQL Interface are translated
> > down into Streaming Expressions - if you need to do something that SQL
> > doesn't yet support you should check out the Streaming Expressions to see
> > if it can support it.
> >
> > With these you could store your data in separate collections (or the same
> > collection with different docType field values) and then during search
> > perform a join (inner, outer, hash) across the collections. You could, if
> > you wanted, even join with data NOT in solr using the jdbc streaming
> > function.
> >
> > - Dennis Gove
> >
> >
> > On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
> > lat...@mdpi.com.invalid> wrote:
> >
> >> '*would I then be able to query a specific field of articles or other
> >> "table" (with the same OR BETTER performances)?*'
> >> -> And especially, would I be able to get only 1 article in the
> result...
> >>
> >> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
> >>
> >> Thanks Jack.
> >>
> >> I know that Solr is a search engine, but this replace a search in my
> >> mysql DB with this model:
> >>
> >>
> >> *My goal is to improve my environment (and my performances at the same
> >> time).*
> >>
> >> *Yes, I have a Solr data model... but atm I created 4 different indexes
> >> for "similar service usage".*
> >> *So atm, for 70 millions of documents, I am duplicating journal data and
> >> publisher data all the time in 1 index (for all articles from the same
> >> journal/pub) in order to be able to retrieve all data in 1 query...*
> >>
> >> *I found yesterday that there is the possibility to create like an array
> >> of  in the data-conf.xml.*
> >> e.g. (pseudo code - incomplete):
> >> 
> >> 
> >> 
> >> 
> >>
> >>
> >> * Would this be a good option? Is this the denormalization you were
> >> proposing? *
> >>
> >> *If yes, would I then be able to query a specific field of articles or
> >> other "table" (with the same OR BETTER performances)? If yes, I might
> >> probably merge all the different indexes together. *
> >> *I'm currently joining everything in mysql, so duplicating the fields in
> >> the solr (pseudo code):*
> >> 
> >> *So I have an index for authors query, a general one for articles (only
> >> needed info of other tables) ...*
> >>
> >> Thanks in advance for the tips. :)
> >>
> >> Kind regards,
> >> Bastien
> >>
> >> On 14/04/2016 16:23, Jack Krupansky wrote:
> >>
> >> Solr is a search engine, not a database.
> >>
> >> JOINs? Although Solr does have some limited JOIN capabilities, they are
> >> more for special situations, not the front-line go-to technique for data
> >> modeling for search.
> >>
> >> Rather, denormalization is the front-line go-to technique for data
> >> modeling in Solr.
> >>
> >> In any case, the first step in data modeling is always to focus on your
> >> queries - what information will be coming into your apps and what
> >> information will the apps want to access based on those inputs.
> >>
> >> But wait... you say you are upgrading, which suggests that you have an
> >> existing Solr data model, and probably queries as well. So...
> >>
> >> 1. Share at least a summary of your existing Solr data model as well 

Solr json api,metrics calculation

2016-04-15 Thread Iana Bondarska
Hi All,
could you please help me with solr metrics on json api:
1) I don't see count metric in list of supported metrics -- is it really
not supported now? E.g. I have records like this:
city  name
NY   johnson
LA smith
NYnull
LA   johnson.

And I want to count of names grouped by city. Seems that for now there is
only distinct count available

2) does average include null values? I checked old solr api -- it seems
that it does not include nulls in average. But json api does. Is that a bug
or they will work differently?

Thanks,
Iana


Question on Solr JDBC driver with SQL client like DB Visualizer

2016-04-15 Thread Reth RM
Note: I followed the steps mentioned in the pdf attached on this Jira
https://issues.apache.org/jira/browse/SOLR-8521

Page 11, the screenshot specifies to select a
"solr-solrj-6.0.0-SNAPSHOT.jar" which is equivalent into
"solr-solrj-6.0.0.jar" shipped with released version, correct?

When I try adding that jar, it doesn't show up driver class, DBVisualizer
still shows "No new driver class". Does it mean the class is not added to
this jar yet?


Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-15 Thread Sara Woodmansee
Hi all,

When I suggested the developer consider upgrading to v5.5 or 6.0 (from v3.6), 
this was their response.  It’s clear that upgrading is not going to happen any 
time soon.

Developer response:  "But to use SOLR 5, there is a need to find a stable and 
reliable php client. And until very recent time there were no release. In other 
case we would have to write PHP client itself.  Then we would have to rewrite 
integration API with a software, because API very likely has changed. And then 
make changes to every single piece of code in backend and frontend of our 
system that is tied up with search functionality in any way. “

— I would still like to know (from you folks) if the “stable PHP client” issue 
still holds true?  Perhaps that is not an easy question.

Thanks, as always,
Sara


> On Apr 14, 2016, at 5:28 PM, Sara Woodmansee  wrote:
> 
> Thanks Jack.
> 
> So - if I understand (all email feedback thus far) correctly:  
> 
> — Upgrading to newer version vital (5.5 —6.0)
> 
> — EnglishMinimalStemFilter:  upgrading to v5.5-6.0 will NOT help with 
> stemming issues, as code has not been updated.
> 
> — PorterStemFilter:  Has been updated to work with better with v5.5 - 6.0
> 
> — Or, perhaps we just need a stemmer that is more dictionary-based 
> (Hunspell?), or inflectional (any suggestions?)
> 
> Thanks again all, for your patience and time!
> Sara
> 
>> On Apr 14, 2016, at 3:51 PM, Jack Krupansky > > wrote:
>> 
>> BTW, I did check and that stemmer code is the same today as it was in 3.x, 
>> so there should be no change in stemmer behavior there.
>> 
>> -- Jack Krupansky
>> 
>> On Thu, Apr 14, 2016 at 3:47 PM, Sara Woodmansee > > wrote:
>> 
>>> Hi Shawn,
>>> 
>>> Thanks so much the feedback. And for the heads-up regarding (the bad form
>>> of) starting a new discussion from an existing one. Thought removing all
>>> content wouldn’t track to original. (Sigh). This is what you get when you
>>> have photographers posting to high-end forums.
>>> 
>>> Thanks Erick, regarding upgrading to v5.  We actually just removed all
>>> test data from the site, so we can now upload all the true, final files and
>>> metadata. In some ways this could be a perfect time to upgrade to v5 (if I
>>> can talk the developer into it) since all metadata has to be re-ingested
>>> anyway..
>>> 
>>> All best,
>>> Sara
>>> 
>>> 
 On Apr 14, 2016, at 3:31 PM, Erick Erickson >
>>> wrote:
 
 re: upgrading to 5.x... 5X Solr's are NOT guaranteed to read 3x indexes, 
 you'd have to go through 4x to do that.
 
 If you can re-index from scratch that would be best.
 
 Best,
 Erick
 
 
> On Apr 14, 2016, at 3:29 PM, Shawn Heisey  > wrote:
> 
> On 4/14/2016 11:17 AM, Sara Woodmansee wrote:
>> I posted yesterday, however I never received my own post, so worried
>>> it did not go through (?)
> 
> I *did* see your previous message, but couldn't immediately think of
> anything constructive to say.  I've had a little bit of time on my lunch
> break today to look deeper.
> 
> EnglishMinimalStemFilter is designed to *not* aggressively stem
> everything it sees.  It appears that the behavior you are seeing is
> probably intentional with that filter.
> 
> In 5.5.0 and 6.0.0, PorterStemFilter will handle words of the form you
> mentioned correctly.  In the screenshot below, PSF means
> "PorterStemFilter".  I did not check any earlier versions.  I already
> had these versions on my system.
> 
> https://www.dropbox.com/s/ss48vinrtbgifce/stemmer-ee-es-6.0.0.png?dl=0 
> 
> 
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> 
> That version of Solr is over four years old.  Bugs in 3.x will *not* be
> fixed.  Bugs in 4.x will also not be fixed.  On 5.x, only extremely
> major bugs are likely to get any attention, and this does not qualify as
> a major bug.
> 
> 
> 
> On another matter:
> 
> http://people.apache.org/~hossman/#threadhijack
> 
> You replied to a message with the subject "Solr Support for BM25F" ...
> so your message is showing up within that thread.
> 
> 
>>> https://www.dropbox.com/s/xi0o8z6smhd2n5d/woodmansee-thread-hijack.png?dl=0 
>>> 
> 
> Thanks,
> Shawn
> 
 
>>> 
> 



Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
You may also want to keep an eye on SOLR-8925 which supports distributed,
cross collection graph traversals. This may be useful in traversing the
relationships.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein  wrote:

> Solr now has full distributed join capabilities as part of the Streaming
> Expression library. Keep in mind that these are distributed joins so they
> shuffle records to worker nodes to perform the joins. These are comparable
> to joins done by SQL over MapReduce systems, but they are very responsive
> and can respond with sub-second response time for fairly large joins in
> parallel mode. But these joins do lend themselves to large distributed
> architectures (lot's of shards an replicas). Target QPS also needs to be
> taken into account and tested in deciding whether these joins will meet the
> specific use case.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove  wrote:
>
>> The Streaming API with Streaming Expressions (or Parallel SQL if you want
>> to use SQL) can give you the functionality you're looking for. See
>> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>> and
>> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
>> SQL queries coming in through the Parallel SQL Interface are translated
>> down into Streaming Expressions - if you need to do something that SQL
>> doesn't yet support you should check out the Streaming Expressions to see
>> if it can support it.
>>
>> With these you could store your data in separate collections (or the same
>> collection with different docType field values) and then during search
>> perform a join (inner, outer, hash) across the collections. You could, if
>> you wanted, even join with data NOT in solr using the jdbc streaming
>> function.
>>
>> - Dennis Gove
>>
>>
>> On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
>> lat...@mdpi.com.invalid> wrote:
>>
>>> '*would I then be able to query a specific field of articles or other
>>> "table" (with the same OR BETTER performances)?*'
>>> -> And especially, would I be able to get only 1 article in the result...
>>>
>>> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
>>>
>>> Thanks Jack.
>>>
>>> I know that Solr is a search engine, but this replace a search in my
>>> mysql DB with this model:
>>>
>>>
>>> *My goal is to improve my environment (and my performances at the same
>>> time).*
>>>
>>> *Yes, I have a Solr data model... but atm I created 4 different indexes
>>> for "similar service usage".*
>>> *So atm, for 70 millions of documents, I am duplicating journal data and
>>> publisher data all the time in 1 index (for all articles from the same
>>> journal/pub) in order to be able to retrieve all data in 1 query...*
>>>
>>> *I found yesterday that there is the possibility to create like an array
>>> of  in the data-conf.xml.*
>>> e.g. (pseudo code - incomplete):
>>> 
>>> 
>>> 
>>> 
>>>
>>>
>>> * Would this be a good option? Is this the denormalization you were
>>> proposing? *
>>>
>>> *If yes, would I then be able to query a specific field of articles or
>>> other "table" (with the same OR BETTER performances)? If yes, I might
>>> probably merge all the different indexes together. *
>>> *I'm currently joining everything in mysql, so duplicating the fields in
>>> the solr (pseudo code):*
>>> 
>>> *So I have an index for authors query, a general one for articles (only
>>> needed info of other tables) ...*
>>>
>>> Thanks in advance for the tips. :)
>>>
>>> Kind regards,
>>> Bastien
>>>
>>> On 14/04/2016 16:23, Jack Krupansky wrote:
>>>
>>> Solr is a search engine, not a database.
>>>
>>> JOINs? Although Solr does have some limited JOIN capabilities, they are
>>> more for special situations, not the front-line go-to technique for data
>>> modeling for search.
>>>
>>> Rather, denormalization is the front-line go-to technique for data
>>> modeling in Solr.
>>>
>>> In any case, the first step in data modeling is always to focus on your
>>> queries - what information will be coming into your apps and what
>>> information will the apps want to access based on those inputs.
>>>
>>> But wait... you say you are upgrading, which suggests that you have an
>>> existing Solr data model, and probably queries as well. So...
>>>
>>> 1. Share at least a summary of your existing Solr data model as well as
>>> at least a summary of the kinds of queries you perform today.
>>> 2. Tell us what exacting is driving your inquiry - are queries too slow,
>>> too cumbersome, not sufficiently powerful, or... what exactly is the
>>> problem you need to solve.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG <
>>> lat...@mdpi.com.invalid> wrote:
>>>
 Hi Guys,

 *I am upgrading from solr 4.2 to 6.0.*
 *I successfully (after some time) migrated the config 

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
Solr now has full distributed join capabilities as part of the Streaming
Expression library. Keep in mind that these are distributed joins so they
shuffle records to worker nodes to perform the joins. These are comparable
to joins done by SQL over MapReduce systems, but they are very responsive
and can respond with sub-second response time for fairly large joins in
parallel mode. But these joins do lend themselves to large distributed
architectures (lot's of shards an replicas). Target QPS also needs to be
taken into account and tested in deciding whether these joins will meet the
specific use case.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove  wrote:

> The Streaming API with Streaming Expressions (or Parallel SQL if you want
> to use SQL) can give you the functionality you're looking for. See
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> and
> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
> SQL queries coming in through the Parallel SQL Interface are translated
> down into Streaming Expressions - if you need to do something that SQL
> doesn't yet support you should check out the Streaming Expressions to see
> if it can support it.
>
> With these you could store your data in separate collections (or the same
> collection with different docType field values) and then during search
> perform a join (inner, outer, hash) across the collections. You could, if
> you wanted, even join with data NOT in solr using the jdbc streaming
> function.
>
> - Dennis Gove
>
>
> On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
> lat...@mdpi.com.invalid> wrote:
>
>> '*would I then be able to query a specific field of articles or other
>> "table" (with the same OR BETTER performances)?*'
>> -> And especially, would I be able to get only 1 article in the result...
>>
>> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
>>
>> Thanks Jack.
>>
>> I know that Solr is a search engine, but this replace a search in my
>> mysql DB with this model:
>>
>>
>> *My goal is to improve my environment (and my performances at the same
>> time).*
>>
>> *Yes, I have a Solr data model... but atm I created 4 different indexes
>> for "similar service usage".*
>> *So atm, for 70 millions of documents, I am duplicating journal data and
>> publisher data all the time in 1 index (for all articles from the same
>> journal/pub) in order to be able to retrieve all data in 1 query...*
>>
>> *I found yesterday that there is the possibility to create like an array
>> of  in the data-conf.xml.*
>> e.g. (pseudo code - incomplete):
>> 
>> 
>> 
>> 
>>
>>
>> * Would this be a good option? Is this the denormalization you were
>> proposing? *
>>
>> *If yes, would I then be able to query a specific field of articles or
>> other "table" (with the same OR BETTER performances)? If yes, I might
>> probably merge all the different indexes together. *
>> *I'm currently joining everything in mysql, so duplicating the fields in
>> the solr (pseudo code):*
>> 
>> *So I have an index for authors query, a general one for articles (only
>> needed info of other tables) ...*
>>
>> Thanks in advance for the tips. :)
>>
>> Kind regards,
>> Bastien
>>
>> On 14/04/2016 16:23, Jack Krupansky wrote:
>>
>> Solr is a search engine, not a database.
>>
>> JOINs? Although Solr does have some limited JOIN capabilities, they are
>> more for special situations, not the front-line go-to technique for data
>> modeling for search.
>>
>> Rather, denormalization is the front-line go-to technique for data
>> modeling in Solr.
>>
>> In any case, the first step in data modeling is always to focus on your
>> queries - what information will be coming into your apps and what
>> information will the apps want to access based on those inputs.
>>
>> But wait... you say you are upgrading, which suggests that you have an
>> existing Solr data model, and probably queries as well. So...
>>
>> 1. Share at least a summary of your existing Solr data model as well as
>> at least a summary of the kinds of queries you perform today.
>> 2. Tell us what exacting is driving your inquiry - are queries too slow,
>> too cumbersome, not sufficiently powerful, or... what exactly is the
>> problem you need to solve.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG <
>> lat...@mdpi.com.invalid> wrote:
>>
>>> Hi Guys,
>>>
>>> *I am upgrading from solr 4.2 to 6.0.*
>>> *I successfully (after some time) migrated the config files and other
>>> parameters...*
>>>
>>> Now I'm just wondering if my indexes are following the best
>>> practices...(and they are probably not :-) )
>>>
>>> What would be the best if we have this kind of sql data to write in Solr:
>>>
>>>
>>> I have several different services which need (more or less), different
>>> data based on these JOINs...
>>>
>>> e.g.:
>>> Service A needs lots of data (but bot all),
>>> 

Anticipated Solr 5.5.1 release date

2016-04-15 Thread Tom Evans
Hi all

We're currently using Solr 5.5.0 and converting our regular old style
facets into JSON facets, and are running in to SOLR-8155 and
SOLR-8835. I can see these have already been back-ported to 5.5.x
branch, does anyone know when 5.5.1 may be released?

We don't particularly want to move to Solr 6, as we have only just
finished validating 5.5.0 with our original queries!

Cheers

Tom


Adding docValues in schema - Solr Cloud 4.8.1

2016-04-15 Thread Vincenzo D'Amore
Dear Solr Gurus :),

I would like to add docValues to few fields definition in production.

I first tried in a test environment during partial reindexing and it seems
have no effect, (i.e. no real benefits with small number of documents to
reindex, 30% of total).

So I have to wait a full reindexing in order to be sure all documents have
fields with docValues enabled.

Given that during the partial reindex I suppose that new documents have
fields with docValues enabled and old documents not. And given that I would
change the production configuration seamless, i.e. without to be forced to
full reindex everything.

The question is: what I have to avoid adding docValues to field definition
to an existing production collection, or may the partial reindexing switch
the collection to an inconsistent situation (or critical)?
In other words, should I full reindex immediately after the schema change
and collection reload?

Best regards,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr best practices for many to many relations...

2016-04-15 Thread Dennis Gove
The Streaming API with Streaming Expressions (or Parallel SQL if you want
to use SQL) can give you the functionality you're looking for. See
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions and
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface.
SQL queries coming in through the Parallel SQL Interface are translated
down into Streaming Expressions - if you need to do something that SQL
doesn't yet support you should check out the Streaming Expressions to see
if it can support it.

With these you could store your data in separate collections (or the same
collection with different docType field values) and then during search
perform a join (inner, outer, hash) across the collections. You could, if
you wanted, even join with data NOT in solr using the jdbc streaming
function.

- Dennis Gove


On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG <
lat...@mdpi.com.invalid> wrote:

> '*would I then be able to query a specific field of articles or other
> "table" (with the same OR BETTER performances)?*'
> -> And especially, would I be able to get only 1 article in the result...
>
> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:
>
> Thanks Jack.
>
> I know that Solr is a search engine, but this replace a search in my mysql
> DB with this model:
>
>
> *My goal is to improve my environment (and my performances at the same
> time).*
>
> *Yes, I have a Solr data model... but atm I created 4 different indexes
> for "similar service usage".*
> *So atm, for 70 millions of documents, I am duplicating journal data and
> publisher data all the time in 1 index (for all articles from the same
> journal/pub) in order to be able to retrieve all data in 1 query...*
>
> *I found yesterday that there is the possibility to create like an array
> of  in the data-conf.xml.*
> e.g. (pseudo code - incomplete):
> 
> 
> 
> 
>
>
> * Would this be a good option? Is this the denormalization you were
> proposing? *
>
> *If yes, would I then be able to query a specific field of articles or
> other "table" (with the same OR BETTER performances)? If yes, I might
> probably merge all the different indexes together. *
> *I'm currently joining everything in mysql, so duplicating the fields in
> the solr (pseudo code):*
> 
> *So I have an index for authors query, a general one for articles (only
> needed info of other tables) ...*
>
> Thanks in advance for the tips. :)
>
> Kind regards,
> Bastien
>
> On 14/04/2016 16:23, Jack Krupansky wrote:
>
> Solr is a search engine, not a database.
>
> JOINs? Although Solr does have some limited JOIN capabilities, they are
> more for special situations, not the front-line go-to technique for data
> modeling for search.
>
> Rather, denormalization is the front-line go-to technique for data
> modeling in Solr.
>
> In any case, the first step in data modeling is always to focus on your
> queries - what information will be coming into your apps and what
> information will the apps want to access based on those inputs.
>
> But wait... you say you are upgrading, which suggests that you have an
> existing Solr data model, and probably queries as well. So...
>
> 1. Share at least a summary of your existing Solr data model as well as at
> least a summary of the kinds of queries you perform today.
> 2. Tell us what exacting is driving your inquiry - are queries too slow,
> too cumbersome, not sufficiently powerful, or... what exactly is the
> problem you need to solve.
>
>
> -- Jack Krupansky
>
> On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG <
> lat...@mdpi.com.invalid> wrote:
>
>> Hi Guys,
>>
>> *I am upgrading from solr 4.2 to 6.0.*
>> *I successfully (after some time) migrated the config files and other
>> parameters...*
>>
>> Now I'm just wondering if my indexes are following the best
>> practices...(and they are probably not :-) )
>>
>> What would be the best if we have this kind of sql data to write in Solr:
>>
>>
>> I have several different services which need (more or less), different
>> data based on these JOINs...
>>
>> e.g.:
>> Service A needs lots of data (but bot all),
>> Service B needs a few data (some fields already included in A),
>> Service C needs a bit more data than B(some fields already included in
>> A/B)...
>>
>> *1. Would it be better to create one single index?*
>> *-> i.e.: this will duplicate journal info for every single article*
>>
>> *2. Would it be better to create several specific indexes for each
>> similar services?*
>>
>>
>>
>>
>>
>> *-> i.e.: this will use more space on the disks (and there are
>> ~70millions of documents to join) 3. Would it be better to create an index
>> per table and make a join? -> if yes, how?? *
>>
>> Kind regards,
>> Bastien
>>
>>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail: latard@mdpi.comhttp://www.mdpi.com/
>
>
> 

Can a field be an array of fields?

2016-04-15 Thread Bastien Latard - MDPI AG

The same email, but with formatting...
(email below)

 Forwarded Message 
Subject:Can a field be an array of fields?
Date:   Fri, 15 Apr 2016 13:51:48 +0200
From:   Bastien Latard - MDPI AG 
To: solr-user@lucene.apache.org



Hi everybody!

/I described a bit what I found in another thread, but I prefer to 
create a new thread for this specific question.../

*It's **possible to create an array of string by doing (incomplete example):
- in the data-conf.xml:*


   
 
 
 
 
   



*- in schema.xml:
*required="false" multiValued="true" />
required="false" multiValued="true" />
required="false" multiValued="true" />
required="false" multiValued="true" />


And this provides something like:

"docs":[
  {
[...]
"given_name":["Bastien",  "Matthieu",  "Nino"],
"last_name":["lastname1", "lastname2", "lastname3",   
"lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are 
unable to tell which one it is...*


My goal would be to get this as a result:

"docs":[
  {
[...]
   "authors_array":
[   
[
"given_name":["Bastien"],
"last_name":["lastname1"]
],
[
"last_name":["lastname2"]
],
[
"given_name":["Matthieu"],
"last_name":["lastname2"]
],
[
"given_name":["Nino"],
"last_name":["lastname4"]
],
]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, 
a.given_name ,'') as given_name, [...]//' but I would like to get an 
array.../


I tried to add something like that to the schema.xml, but this doesn't 
work (well, it might be of type 'array'):
required="false" multiValued="true"/>


Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/





Can a field be an array of fields?

2016-04-15 Thread Bastien Latard - MDPI AG

Hi everybody!

/I described a bit what I found in another thread, but I prefer to 
create a new thread for this specific question.../

*It's **possible to create an array of string by doing (incomplete example):
- in the data-conf.xml:*


   
 
 
 
 
   



*- in schema.xml:
*required="false" multiValued="true" />
required="false" multiValued="true" />
required="false" multiValued="true" />
required="false" multiValued="true" />


And this provides something like:

"docs":[
  {
[...]
"given_name":["Bastien",  "Matthieu",  "Nino"],
"last_name":["lastname1", "lastname2", "lastname3",   
"lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are 
unable to tell which one it is...*


My goal would be to get this as a result:

"docs":[
  {
[...]
   "authors_array":
[   
[
"given_name":["Bastien"],
"last_name":["lastname1"]
],
[
"last_name":["lastname2"]
],
[
"given_name":["Matthieu"],
"last_name":["lastname2"]
],
[
"given_name":["Nino"],
"last_name":["lastname4"]
],
]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, 
a.given_name ,'') as given_name, [...]//' but I would like to get an 
array.../


I tried to add something like that to the schema.xml, but this doesn't 
work (well, it might be of type 'array'):
required="false" multiValued="true"/>


Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Error starting Solr-6.0.0 in HDFS mode (in Windows 7)

2016-04-15 Thread Rohana Rajapakse
java.nio.file.InvalidPathException:java.nio.file.InvalidPathException: Illegal 
char <:> at index 4: hdfs:\\myserver:9000\solr

It doesn't like the colon.  I have tried starting solr on windows command line 
with:

bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory
 -Dsolr.lock.type=hdfs
 -Dsolr.data.dir=hdfs://host:port/path
 -Dsolr.updatelog=hdfs://host:port/path

And also by doing necessary the setting in solrconfig.xml file.

I am using solr-6.0.0


Rohana




Registered Office: 24 Darklake View, Estover, Plymouth, PL6 7TL.
Company Registration No: 3553908

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email.

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.




Re: Adding replica on solr - 5.50

2016-04-15 Thread Jaroslaw Rozanski
Hi,

Does the `=...` actually work for you? When attempting similar with
Solr 5.3.1, despite what documentation said, I had to use
`node_name=...`.


Thanks,
Jarek

On Fri, 15 Apr 2016, at 05:48, John Bickerstaff wrote:
> Another thought - again probably not it, but just in case...
> 
> Shouldn't this: =x.x.x.x:9001_solr
> 
> 
> Actually be this?  =x.x.x.x:9001/solr
> 
> 
> (Note the / instead of _ )
> 
> On Thu, Apr 14, 2016 at 10:45 PM, John Bickerstaff
>  > wrote:
> 
> > Jay - it's probably too simple, but the error says "not currently active"
> > which could, of course, mean that although it's up and running, it's not
> > listening on the port you have in the command line...  Or that the port is
> > blocked by a firewall or other network problem.
> >
> > I note that you're using ports different from the default 8983 for your
> > Solr instances...
> >
> > You probably checked already, but I thought I'd mention it.
> >
> >
> > On Thu, Apr 14, 2016 at 8:30 PM, John Bickerstaff <
> > j...@johnbickerstaff.com> wrote:
> >
> >> Thanks Eric!
> >>
> >> I'll look into that immediately - yes, I think that cURL would qualify as
> >> scriptable for my IT lead.
> >>
> >> In the end, I found I could do it two ways...
> >>
> >> Either copy the entire solr data directory over to /var/solr/data on the
> >> new machine, change the directory name and the entries in the
> >> core.properties file, then start the already-installed Solr in cloud mode -
> >> everything came up roses in the cloud section of the UI - the new replica
> >> was there as part of the collection, properly named and worked fine.
> >>
> >> Alternatively, I used the command I mentioned earlier and then waited as
> >> the data was replicated over to the newly-created replica -- again,
> >> everything was roses in the Cloud section of the Admin UI...
> >>
> >> What might I have messed up in this scenario?  I didn't love the hackish
> >> feeling either, but had been unable to find anything like the addreplica -
> >> although I did look for a fairly long time - I'm glad to know about it now.
> >>
> >>
> >>
> >> On Thu, Apr 14, 2016 at 7:36 PM, Erick Erickson 
> >> wrote:
> >>
> >>> bq:  the Solr site about how to add a
> >>> replica to a Solr cloud.  The Admin UI appears to require that the
> >>> directories be created anyway
> >>>
> >>> No, no, a thousand times NO! You're getting confused,
> >>> I think, with the difference between _cores_ and _collections_
> >>> (or replicas in a collection).
> >>>
> >>> Do not use the admin UI for _cores_ to create replicas. It's possible
> >>> if (and only if) you do it exactly correctly. Instead, use the
> >>> collections API
> >>> ADDREPLICA command here:
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
> >>>
> >>> Which you could cURL etc., does that qualify as "scripting" in your
> >>> situation?
> >>>
> >>> You're right, the Solr instance must be up and running for the replica to
> >>> be added, but that's not onerous
> >>>
> >>>
> >>> The bin/solr script is a "work in progress", and doesn't have direct
> >>> support
> >>> for "addreplica", but it could be added.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Apr 14, 2016 at 6:22 PM, John Bickerstaff
> >>>  wrote:
> >>> > Sure - couldn't agree more.
> >>> >
> >>> > I couldn't find any good documentation on the Solr site about how to
> >>> add a
> >>> > replica to a Solr cloud.  The Admin UI appears to require that the
> >>> > directories be created anyway.
> >>> >
> >>> > There is probably a way to do it through the UI, once Solr is
> >>> installed on
> >>> > a new machine - and IIRC, I did manage that, but my IT guy wanted
> >>> > scriptable command lines.
> >>> >
> >>> > Also, IIRC, the stuff I did on the command line actually showed the
> >>> API URL
> >>> > as part of the output so Jay could try that and see what the difference
> >>> > is...
> >>> >
> >>> > Jay - I'm going offline now, but if you're still stuck tomorrow, I'll
> >>> try
> >>> > to recreate... I have a VM snapshot just before I issued the command...
> >>> >
> >>> > Keep in mind everything I did was in a Solr Cloud...
> >>> >
> >>> > On Thu, Apr 14, 2016 at 6:21 PM, Jeff Wartes 
> >>> wrote:
> >>> >
> >>> >> I’m all for finding another way to make something work, but I feel
> >>> like
> >>> >> this is the wrong advice.
> >>> >>
> >>> >> There are two options:
> >>> >> 1) You are doing something wrong. In which case, you should probably
> >>> >> invest in figuring out what.
> >>> >> 2) Solr is doing something wrong. In which case, you should probably
> >>> >> invest in figuring out what, and then file a bug so it doesn’t happen
> >>> to
> >>> >> anyone 

Re: Solr Sharding Strategy

2016-04-15 Thread Bhaumik Joshi
Hi ,

Toke - I tried with pausing the indexing fully but got the slight improvement 
so the impact of indexing is not that much.

Shawn - Answer to your question - I am sending one document in one update 
request.

I have test solr cloud configured like 2 shards on one machine and each of has 
one replica on another machine. So in order to check the network latency is 
bottleneck or not i have disabled replicas and run the test but didn't get 
improvement.

Another thing i have tried in order to balance the load and providing more CPU 
and memory resources i have configured only 2 shards both are on separate 
machine and no replica and then run the test but in that case performance got 
down.

Talking about the production we want to have 2 shard in order to make platform 
scalable  and future proof. Just want inform that we have 22 collections on 
production in that 4 are major in terms of volume and complexity and which 
frequently used for querying and indexing and rest of them are comparatively 
minor and have less query and index hits. Below are the production index 
statistics.

No of collections: 22 collections having 139 million documents with index size 
of 85 GB.
Major collections: 4 collections having 134 million documents with index size 
of 77 GB.
Minor collections: 18 collections having 5 million documents with index size of 
8 GB.

So any idea on how to improve query performance with this statistics along with 
Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) 
scenario?

Thanks & Regards,
Bhaumik Joshi


From: Shawn Heisey 
Sent: Tuesday, April 12, 2016 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Sharding Strategy

On 4/11/2016 6:31 AM, Bhaumik Joshi wrote:
> We are using solr 5.2.0 and we have Index-heavy (100 index updates per
> sec) and Query-heavy (100 queries per sec) scenario.
>
> *Index stats: *10 million documents and 16 GB index size
>
>
>
> Which sharding strategy is best suited in above scenario?
>
> Please share reference resources which states detailed comparison of
> single shard over multi shard if any.
>
>
>
> Meanwhile we did some tests with SolrMeter (Standalone java tool for
> stress tests with Solr) for single shard and two shards.
>
> *Index stats of test solr cloud: *0.7 million documents and 1 GB index
> size.
>
> As observed in test average query time with 2 shards is much higher
> than single shard.
>

On the same hardware, multiple shards will usually be slower than one
shard, especially under a high load.  Sharding can give good results
with *more* hardware, providing more CPU and memory resources.  When the
query load is high, there should only be only one core (shard replica)
per server, and Solr works best when it is running on bare metal, not
virtualized.

Handling 100 queries per second will require multiple copies of your
index on separate hardware.  This is a fairly high query load.  There
are installations handling much higher loads, of course.  Those
installations have a LOT of replicas and some way to balance load across
them.

For 10 million documents and 16GB of index, I'm not sure that I would
shard at all, just make sure that each machine has plenty of memory --
probably somewhere in the neighborhood of 24GB to 32GB.  That assumes
that Solr is the only thing running on that server, and that if it's
virtualized, making sure that the physical server's memory is not
oversubscribed.

Regarding your specific numbers:

The low queries per second may be caused by one or more of these
problems, or perhaps something I haven't thought of:  1) your queries
are particularly heavy.  2) updates are interfering by tying up scarce
resources.  3) you don't have enough memory in the machine.

How many documents are in each update request that you are sending?  In
another thread on the list, you have stated that you have a 1 second
maxTime on autoSoftCommit.  This is *way* too low, and a *major* source
of performance issues.  Very few people actually need that level of
latency -- a maxTime measured in minutes may be fast enough, and is much
friendlier for performance.

Thanks,
Shawn

Re: Solr best practices for many to many relations...

2016-04-15 Thread Bastien Latard - MDPI AG
'/would I then be able to query a specific field of articles or other 
"table" (with the same OR BETTER performances)?/'

-> And especially, would I be able to get only 1 article in the result...

On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote:

Thanks Jack.

I know that Solr is a search engine, but this replace a search in my 
mysql DB with this model:



*My goal is to improve my environment (and my performances at the same 
time).*

/
//Yes, I have a Solr data model... but atm I created 4 different 
indexes for "similar service usage".//
//So atm, for 70 millions of documents, I am duplicating journal data 
and publisher data all the time in 1 index (for all articles from the 
same journal/pub) in order to be able to retrieve all data in 1 query.../


*I found yesterday that there is the possibility to create like an 
array of  in the data-conf.xml.*

e.g. (pseudo code - incomplete):





*
Would this be a good option? Is this the denormalization you were 
proposing?
*/If yes, would I then be able to query a specific field of articles 
or other "table" (with the same OR BETTER performances)?

If yes, I might probably merge all the different indexes together.
/*
*/I'm currently joining everything in mysql, so duplicating the fields 
in the solr (pseudo code):/
*
*/So I have an index for authors query, a general one for articles 
(only needed info of other tables) .../*


*Thanks in advance for the tips. :)
*
*Kind regards,
Bastien*
*

On 14/04/2016 16:23, Jack Krupansky wrote:

Solr is a search engine, not a database.

JOINs? Although Solr does have some limited JOIN capabilities, they 
are more for special situations, not the front-line go-to technique 
for data modeling for search.


Rather, denormalization is the front-line go-to technique for data 
modeling in Solr.


In any case, the first step in data modeling is always to focus on 
your queries - what information will be coming into your apps and 
what information will the apps want to access based on those inputs.


But wait... you say you are upgrading, which suggests that you have 
an existing Solr data model, and probably queries as well. So...


1. Share at least a summary of your existing Solr data model as well 
as at least a summary of the kinds of queries you perform today.
2. Tell us what exacting is driving your inquiry - are queries too 
slow, too cumbersome, not sufficiently powerful, or... what exactly 
is the problem you need to solve.



-- Jack Krupansky

On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG 
 wrote:


Hi Guys,

/I am upgrading from solr 4.2 to 6.0.//
//I successfully (after some time) migrated the config files and
other parameters.../

Now I'm just wondering if my indexes are following the best
practices...(and they are probably not :-) )

What would be the best if we have this kind of sql data to write
in Solr:


I have several different services which need (more or less),
different data based on these JOINs...

e.g.:
Service A needs lots of data (but bot all),
Service B needs a few data (some fields already included in A),
Service C needs a bit more data than B(some fields already
included in A/B)...

*1. Would it be better to create one single index?**
**-> i.e.: this will duplicate journal info for every single
article**
**
**2. Would it be better to create several specific indexes for
each similar services?**
**-> i.e.: this will use more space on the disks (and there are
~70millions of documents to join)

3. Would it be better to create an index per table and make a join?
-> if yes, how??

*

Kind regards,
Bastien




Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/


Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re: Solr best practices for many to many relations...

2016-04-15 Thread Bastien Latard - MDPI AG

Thanks Jack.

I know that Solr is a search engine, but this replace a search in my 
mysql DB with this model:



*My goal is to improve my environment (and my performances at the same 
time).*

/
//Yes, I have a Solr data model... but atm I created 4 different indexes 
for "similar service usage".//
//So atm, for 70 millions of documents, I am duplicating journal data 
and publisher data all the time in 1 index (for all articles from the 
same journal/pub) in order to be able to retrieve all data in 1 query.../


*I found yesterday that there is the possibility to create like an array 
of  in the data-conf.xml.*

e.g. (pseudo code - incomplete):





*
Would this be a good option? Is this the denormalization you were proposing?
*/If yes, would I then be able to query a specific field of articles or 
other "table" (with the same OR BETTER performances)?

If yes, I might probably merge all the different indexes together.
/*
*/I'm currently joining everything in mysql, so duplicating the fields 
in the solr (pseudo code):/
*
*/So I have an index for authors query, a general one for articles (only 
needed info of other tables) .../*


*Thanks in advance for the tips. :)
*
*Kind regards,
Bastien*
*

On 14/04/2016 16:23, Jack Krupansky wrote:

Solr is a search engine, not a database.

JOINs? Although Solr does have some limited JOIN capabilities, they 
are more for special situations, not the front-line go-to technique 
for data modeling for search.


Rather, denormalization is the front-line go-to technique for data 
modeling in Solr.


In any case, the first step in data modeling is always to focus on 
your queries - what information will be coming into your apps and what 
information will the apps want to access based on those inputs.


But wait... you say you are upgrading, which suggests that you have an 
existing Solr data model, and probably queries as well. So...


1. Share at least a summary of your existing Solr data model as well 
as at least a summary of the kinds of queries you perform today.
2. Tell us what exacting is driving your inquiry - are queries too 
slow, too cumbersome, not sufficiently powerful, or... what exactly is 
the problem you need to solve.



-- Jack Krupansky

On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG 
> wrote:


Hi Guys,

/I am upgrading from solr 4.2 to 6.0.//
//I successfully (after some time) migrated the config files and
other parameters.../

Now I'm just wondering if my indexes are following the best
practices...(and they are probably not :-) )

What would be the best if we have this kind of sql data to write
in Solr:


I have several different services which need (more or less),
different data based on these JOINs...

e.g.:
Service A needs lots of data (but bot all),
Service B needs a few data (some fields already included in A),
Service C needs a bit more data than B(some fields already
included in A/B)...

*1. Would it be better to create one single index?**
**-> i.e.: this will duplicate journal info for every single article**
**
**2. Would it be better to create several specific indexes for
each similar services?**
**-> i.e.: this will use more space on the disks (and there are
~70millions of documents to join)

3. Would it be better to create an index per table and make a join?
-> if yes, how??

*

Kind regards,
Bastien




Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re:Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

2016-04-15 Thread cqlangyi
hi guys,


thank you very much for the help. sorry been so lated to reply.


1. "commit" didn't help.
after commit, the 'numFound' of "*:*" query is still the same.


2. "id" field in every doc is generated by solr using UUID, i have
idea how to check if there is a duplicated one. but i assuming
there shouldn't be, unless solr cloud has some known bug when
using UUID in a distributed environment.


the environment is


solr cloud with:
3 linux boxes, use zookeeper 3.4.6  + solr 5.2.1, oracle JDK 1.7.80


any ideas?


thank you very much.






At 2016-04-05 12:09:14, "John Bickerstaff"  wrote:
>Both of us implied it, but to be completely clear - if you have a duplicate
>ID in your data set, SOLR will throw away previous documents with that ID
>and index the new one.  That's fine if your duplicates really are
>duplicates - it's not OK if there's a problem in the data set and the
>duplicates ID's are on documents that are actually unique.
>
>On Mon, Apr 4, 2016 at 9:51 PM, John Bickerstaff 
>wrote:
>
>> Sweet - that's a good point - I ran into that too - I had not run the
>> commit for the last "batch" (I was using SolrJ) and so numbers didn't match
>> until I did.
>>
>> On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal 
>> wrote:
>>
>>> 1) Are you sure you don't have duplicates?
>>> 2) All of your records might have been indexed but a new searcher may not
>>> have opened on the updated index yet. Try issuing a commit and see if that
>>> works.
>>>
>>> On Tue, 5 Apr 2016, 08:56 cqlangyi,  wrote:
>>>
>>> > hi there,
>>> >
>>> >
>>> > i have an solr 5.2.1,  when i do data import, after the job is done,
>>> it's
>>> > shown 165,191 rows processed successfully.
>>> >
>>> >
>>> > but when i query with *:*, the "numFound" shown only 163,349 docs in
>>> index.
>>> >
>>> >
>>> > when i tred to do it again, , it's shown 165,191 rows processed
>>> > successfully. but the *:* query result now is 162,390.
>>> >
>>> >
>>> > no errors in any log,
>>> >
>>> >
>>> > any idea?
>>> >
>>> >
>>> > thank you very much!
>>> >
>>> >
>>> > cq
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > At 2016-04-05 09:19:48, "Chris Hostetter" 
>>> > wrote:
>>> > >
>>> > >: I am not sure how to use "Sort By Function" for Case.
>>> > >:
>>> > >:
>>> |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
>>> > >:
>>> > >: Can you tell how to fetch 40 when input is 10.
>>> > >
>>> > >Something like...
>>> > >
>>> >
>>> >
>>> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,)))
>>> > >
>>> > >But i suspect there may be a much better way to achieve your ultimate
>>> goal
>>> > >if you tell us what it is.  what do these fields represent? what makes
>>> > >these numeric valuessignificant? do you know which values are
>>> significant
>>> > >when indexing, or do they vary for every query?
>>> > >
>>> > >https://people.apache.org/~hossman/#xyproblem
>>> > >XY Problem
>>> > >
>>> > >Your question appears to be an "XY Problem" ... that is: you are
>>> dealing
>>> > >with "X", you are assuming "Y" will help you, and you are asking about
>>> "Y"
>>> > >without giving more details about the "X" so that we can understand the
>>> > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
>>> > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >-Hoss
>>> > >http://www.lucidworks.com/
>>> >
>>> --
>>> Regards,
>>> Binoy Dalal
>>>
>>
>>


Re: Growing memory?

2016-04-15 Thread Shawn Heisey
On 4/14/2016 1:25 PM, Betsey Benagh wrote:
> bin/solr status shows the memory usage increasing, as does the admin ui.
>
> I¹m running this on a shared machine that is supporting several other
> applications, so I can¹t be particularly greedy with memory usage.  Is
> there anything out there that gives guidelines on what an appropriate
> amount of heap is based on number of documents or whatever?  We¹re just
> playing around with it right now, but it sounds like we may need a
> different machine in order to load in all of the data we want to have
> available.

That means you're seeing the memory usage from Java's point of view. 
There will be three numbers in the admin UI.  The first is the actual
amount of memory used by the program right at that instant.  The second
is the highest amount of memory that has ever been allocated since the
program started.  The third is the maximum amount of memory that *can*
be allocated.  It's normal for the last two numbers to be the same and
the first number to fluctuate up and down.

>From the operating system's point of view, the program will be using the
amount from the middle number on the admin UI, plus some overhead for
Java itself.

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

In addition to having enough heap memory, getting good performance will
require that you have additional memory in the system that is not
allocated to ANY program, which the OS can use to cache your index
data.  The total amount of memory that a well-tuned Solr server requires
often surprises people.  Running Solr with other applications on the
same server may not be a problem if your Solr server load is low and
your indexes are very small, but if your indexes are large and/or Solr
is very busy, those other applications might interfere with Solr
performance.

Thanks,
Shawn