Re: SQL rpt_location question

2017-03-24 Thread Joel Bernstein
of the SQL handler. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Mar 24, 2017 at 10:09 AM, GW <thegeofo...@gmail.com> wrote: > Dear reader, > > I've found that using the distinct clause gives me the list I want. > > I also have a multivalued rpt_location in the col

Re: Difference between hashJoin and innerJoin in Streaming Expression

2017-03-24 Thread Joel Bernstein
. This doesn't require any specific sort but it is limited in size by how much data can fit in the hash map. You can parallelize both joins using the parallel function to improve scalability and performance. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Mar 24, 2017 at 4:49 AM, Zheng Lin

Re: Regex Phrases

2017-03-23 Thread Joel Bernstein
You can also checkout https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer . Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Susheel: > &g

Re: Concatenating streams in streaming expressions

2017-03-22 Thread Joel Bernstein
There isn't a cat function yet. The closest function we have currently is a merge function: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-merge But I've been meaning to add a cat function so feel free to create the jira. Joel Bernstein http

Re: model building

2017-03-22 Thread Joel Bernstein
findings in the ticket. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 22, 2017 at 9:58 AM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Thank you Tim. I appreciated the tips. At this point, I'm just trying to > understand how to use it. The 30 tweets that I've sel

Re: model building

2017-03-20 Thread Joel Bernstein
the same seems odd though. The idfs_ds in particular were designed to be accurate when there are multiple training sets in the same collection. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 20, 2017 at 5:41 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: >

Re: Parallelizing post filter for better performance

2017-03-17 Thread Joel Bernstein
You'll probably get better results by trying to get more performance out of your single threaded postfilter. If you can post the code in you collect() method you may get some ideas on how to improve the performance. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Mar 17, 2017 at 2:13 PM

Re: SQL JOIN eta

2017-03-16 Thread Joel Bernstein
There isn't a jira issue for this yet. FOr Solr 6.6 there are few important features lined up: 1) SELECT COUNT(DISTINCT) 2) Date/Time function support 3) Arithmetic function support 4) SELECT ... INTO ... Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 10:53 PM, Damien

Re: Using fetch function with streaming expression

2017-03-15 Thread Joel Bernstein
. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 7:53 PM, Pratik Patel <pra...@semandex.net> wrote: > Wow, this is interesting! Is it going to be a new addition to solr or is it > already available cause I can not find it in documentation? I am using solr >

Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
count(*)), gt(count(*),1))), fl="concept_name", on="ancestors=conceptid") Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 11:51 AM, Pratik Patel <pra...@semandex.net> wrote: > H

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, there has been a lot of changes to configs in Solr 6. All the streaming request handlers have now been made implicit so the solrconfig.xml doesn't include them. Something seems to be stepping on the implicit configs. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 12

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, something is wrong with the configuration, because /export only should be returning json. Have you changed the configurations? What were the exact steps you used in setting up the server? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo

Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
explain a little more about the use case? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 11:08 AM, Pratik Patel <pra...@semandex.net> wrote: > I have two types of documents in my index. eventLink and concepttData. > > eventLink { ancestors:[,] } > concep

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Ok. I updated the other thread with a URL to run based on what I've seeing in the logs. Try running that URL and let's see what comes back. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 10:26 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: >

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
try running the following query: http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com =false=from,to=to+asc,from+asc=json=2.2 Let's see what comes back from this. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 10:20 AM, Zheng Lin Edwin Yeo <edwi

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Looks like there might be something strange with your configuration. Did you upgrade an existing install or is this a standard Solr 6.4.1 install? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 6:22 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, &g

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
You're getting json parse errors, that look like your getting an XML response. Do you see any errors in the logs other then the stack trace. I suspect there might be another error above the stack trace which shows the error from the server that causing it to respond with XML. Joel Bernstein

Re: Iterating sorted result docs in a custom search component

2017-03-13 Thread Joel Bernstein
Are you sorting on a single field, or multiple fields? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 13, 2017 at 6:49 PM, alexpusch <a...@getjaco.com> wrote: > As have been said, only the top N results are collected, but in order to > find > out which of the results

Re: BooleanEvaluator inside 'having' function of a streaming expression

2017-03-13 Thread Joel Bernstein
the query on the storeid. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 13, 2017 at 1:06 PM, Pratik Patel <pra...@semandex.net> wrote: > Hi, > > I am trying to write a streaming expression with 'having' function in it. > Following is my simple query. > >

Re: Error for Graph Traversal using Streaming Expressions

2017-03-13 Thread Joel Bernstein
Syntax looks ok. The logs should have a stack trace. One thing it could be is that gatherNodes will only work on single value fields currently. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Mar 13, 2017 at 1:59 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > &g

Re: Iterating sorted result docs in a custom search component

2017-03-12 Thread Joel Bernstein
The /export handler does exactly what you described, but it stream documents rather trying to sort everything in memory at once. In Solr 4.11 the class is called SortingResponseWriter. You can take a look at the approach used. Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Mar 12, 2017

Re: Using multi valued field in solr cloud Graph Traversal Query

2017-03-10 Thread Joel Bernstein
ou have other questions about how to structure the data or run the queries. Adding multi-value field support is a fairly high priority so I would expect this to be coming in a future release. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Mar 10, 2017 at 5:15 PM, Pratik Patel <

Re: Solr Json APi aggregation specifiy min and max values

2017-03-10 Thread Joel Bernstein
. The Having clause is implemented in the SQL handler though, not in the json facet API. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Mar 10, 2017 at 12:55 PM, lazarusjohn <lazarusj...@gmail.com> wrote: > > Is it possible to get average of amount between two values(min and max

Re: Solr JDBC with Core (vs Collection)

2017-03-08 Thread Joel Bernstein
ill release as part of the first release of the significantTerms expression in Solr 6.5. Solr 6.6 will likely have support for all stream source and parallel SQL/JDBC. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 8, 2017 at 2:19 PM, OTH <omer.t@gmail.com> wrote: > Hello, >

Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-07 Thread Joel Bernstein
Yes, I think Apache OpenNLP should be fine. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 7, 2017 at 8:09 AM, Avtar Singh Mehra <asmehr...@gmail.com> wrote: > Well i have created some filters using Apache OpenNLP. Will it work? > > On 6 March 2017 at 00:30, Joel B

Re: Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Joel Bernstein
The significantTerms streaming expression could be useful as a co-occurrence based suggester. This coming in Solr 6.5 but could be easily backported to earlier releases. This blog describes how it works: http://joelsolr.blogspot.com/2017/02/anomaly-detection-in-solr-65.html Joel Bernstein http

Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Joel Bernstein
I believe StanfordCore is licensed under the GPL which means it will be incompatible with the Apache License. Would it be possible to port to a different NLP library? Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Mar 5, 2017 at 12:14 PM, Erick Erickson <erickerick...@gmail.com>

Re: Question about best way to architect a Solr application with many data sources

2017-02-23 Thread Joel Bernstein
lists are also stored along with documents and passed to Solr to support document level access control during the search. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 22, 2017 at 3:01 PM, Tim Casey <tca...@gmail.com> wrote: > I would possibly extend this a b

Re: Field collapsing, facets, and qtime: caching issue?

2017-02-13 Thread Joel Bernstein
: https://issues.apache.org/jira/browse/SOLR-8092. Fixing this problem would likely resolve your scenario as well. I haven't broken ground on it yet though. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Feb 13, 2017 at 12:52 PM, ronbraun <ronbr...@gmail.com> wrote: &g

Re: Is there any alternative to '*' in SQL interfaces?

2017-02-12 Thread Joel Bernstein
eeper ties into the Solr schema. Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Feb 12, 2017 at 10:16 AM, Yasufumi Mizoguchi <yasufumi0...@gmail.com > wrote: > Hi, > > I'm a newbie and trying SQL interfaces on Solr 6.4.1. > > Firstly, I tried to get the values of all fields, b

Re: Field collapsing, facets, and qtime: caching issue?

2017-02-10 Thread Joel Bernstein
on the first and second pass and is not cached, the so query/collapse need to be run twice for facets. The fix for this would be to start caching the DocSets needed for faceting. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Feb 10, 2017 at 1:29 PM, Ronald K. Braun <ronbr...@gmail.com>

Re: alerting system with Solr's Streaming Expressions

2017-02-09 Thread Joel Bernstein
Also you can see in the final iteration of the model that there are 8 true positives and 8 false positives. So this model classifies everything as positive. At that you know that it's not a good model. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Feb 9, 2017 at 11:03 AM, Joel Bernstein

Re: alerting system with Solr's Streaming Expressions

2017-02-09 Thread Joel Bernstein
it selects. Then you need multiple examples for each feature. I was testing with the enron ham/spam data set. It would be good to download that dataset and see what that looks like. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Feb 9, 2017 at 10:15 AM, Susheel Kumar <susheel2...@gmail.

Re: alerting system with Solr's Streaming Expressions

2017-02-08 Thread Joel Bernstein
Can you post the final iteration of the model? Also the expression you used to train the model? How much training data do you have? Ho many positive examples and negatives examples? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Feb 7, 2017 at 2:14 PM, Susheel Kumar <sushe

Re: Find groups where at least one item matches a query

2017-02-06 Thread Joel Bernstein
ill include the "groupId" in the ancestor field of each doc id. You'll find that when master the graph expression syntax you'll be able to do all kinds of interesting graph queries on the data set you've described, which is really a best treated as a graph. Joel Bernstein http://joelsolr.blo

Re: Find groups where at least one item matches a query

2017-02-05 Thread Joel Bernstein
Take a look at the graph expressions: https://cwiki.apache.org/confluence/display/solr/Graph+Traversal Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Feb 5, 2017 at 3:43 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > What about collapse and expand with overriden query.

Re: Solr 6 Facet range query over streaming API

2017-02-02 Thread Joel Bernstein
simple aggregations over buckets, but not the kind of automatic date range faceting you're currently using. Aggregations are going to be getting more attention in Streaming Expressions soon, to support additional functionality Parallel SQL. Joel Bernstein http://joelsolr.blogspot.com/ On Thu

Re: How to combine third party search data as top results ?

2017-02-01 Thread Joel Bernstein
Also this presentation discusses the RankQuery (Starting on slide 16) http://www.slideshare.net/lucidworks/managed-search-presented-by-jacob-graves-getty-images Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 1, 2017 at 9:58 PM, Joel Bernstein <joels...@gmail.com> wrote: >

Re: How to combine third party search data as top results ?

2017-02-01 Thread Joel Bernstein
/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java And the base class this extends: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/AbstractReRankQuery.java Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Feb 1

Re: Solr Kafka DIH

2017-01-31 Thread Joel Bernstein
/JDBCStream.java Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 31, 2017 at 12:07 PM, Mike Thomsen <mikerthom...@gmail.com> wrote: > Probably not, but writing your own little Java process to do it would be > trivial with Kafka 0.9.X or 0.10.X. You can also look at the Confluen

Re: Single call for distributed IDF?

2017-01-31 Thread Joel Bernstein
, but the overhead was so low that it seemed acceptable. This is quite different then what you describe and also quite different then the stats caching approach which is currently in Solr. Maybe I'm just bias to my own approach, but it seems simple and fast. Joel Bernstein http://joelsolr.blogspot.com

Re: Streaming Expressions result-set fields not in order

2017-01-27 Thread Joel Bernstein
about this issue and update this thread with the issue number. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jan 25, 2017 at 9:59 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > > I'm trying out the Streaming Expressions in Solr 6.3.0. > > Currently,

Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
Reading your blogs now. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 24, 2017 at 3:28 PM, Joel Bernstein <joels...@gmail.com> wrote: > Ok my mistake, I was thinking you were writing your own component and > needed a fast way to get global IDF. You're looking for fas

Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
the query and fetch the IDF, then pass it along to the shards? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Specifically, I’m talking about this: > > http://observer.wunderwood.org/ (my blog) &g

Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
Ah, I thought you were just interested in a fast way to get at IDF. This approach does take a callback but it's really fast. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wun...@wunderwood.org> wrote: > I know how to do it. You

Re: Single call for distributed IDF?

2017-01-24 Thread Joel Bernstein
. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wun...@wunderwood.org> wrote: > I tried running with the LRUStatsCache for global IDF, but the performance > penalty was pretty big. The 95th percentile response time went from 3.4 > second

Re: NPE when using timeAllowed in the /export handler

2017-01-23 Thread Joel Bernstein
I'd have to put some thought into this. The problem with timeAllowed is that it won't return all the results. So if you're using timeAllowed and performing a join or aggregation it will just give incorrect answers. I'm not we want to have that. Joel Bernstein http://joelsolr.blogspot.com/ On Sat

Re: OuterHashJoin doesn't return values

2017-01-23 Thread Joel Bernstein
When you work with relational algebra operations you'll need to specify the /export handler in the search expressions so that all of the tuples are operated on by the join. search(ParentDocuments, q=DocId:1042, fl="Id,DocId,SubDocId", sort="Id asc", q="/expor

Re: NPE when using timeAllowed in the /export handler

2017-01-21 Thread Joel Bernstein
I'm pretty sure that time allowed and the /export handler are not currently compatible. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jan 20, 2017 at 8:57 PM, radha krishnan <dradhakrishna...@gmail.com> wrote: > Hi, > > am trying to query a core with 60 million doc

Re: Streams return default values for fields that doesn't exist in the document

2017-01-21 Thread Joel Bernstein
Also take a look at: http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html It describes a very flexible approach to do batch re-indexing jobs. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Jan 21, 2017 at 6:44 PM, Yago Riveiro <yago.rive...@gmail.com>

Re: equivalent of json.facet's "gap" keyword in /sql

2017-01-13 Thread Joel Bernstein
The time functions aren't supported in the SQL interface currently. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jan 13, 2017 at 10:44 AM, radha krishnan <dradhakrishna...@gmail.com > wrote: > Hi, > > can we write an SQL statement and use the /sql handler to get the >

Re: Stream function is not getting the result

2017-01-11 Thread Joel Bernstein
The http parameter is 'expr', you're using stream.body. The docs contain the basic syntax: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jan 11, 2017 at 7:13 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>

Re: CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Joel Bernstein
Currently these are not settable.It's easy enough to add a setter for this values. What types of behaviors have you run into when CloudSolrClient is having timeouts issues? Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Jan 9, 2017 at 10:06 AM, Yago Riveiro <yago.rive...@gmail.com>

Re: CloudSolrStream client doesn't validate sort order

2017-01-07 Thread Joel Bernstein
We have this fixed in Solr 6.4 coming out next week. Here is the jira: SOLR-9495. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Jan 7, 2017 at 3:41 PM, Yago Riveiro <yago.rive...@gmail.com> wrote: > Hi, > > The CloudSolrStream client (Solr 6.3.0) assumes that the sort para

Re: Regarding /sql -- WHERE <> IS NULL and IS NOT NULL

2017-01-05 Thread Joel Bernstein
IS NULL and IS NOT NULL predicate are not currently supported. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jan 5, 2017 at 2:05 PM, radha krishnan <dradhakrishna...@gmail.com> wrote: > Hi, > > solr version : 6.3 > > will WHERE <> IS NULL / IS NOT NULL

Re: Random Streaming Function not there? SolrCloud 6.3.0

2017-01-04 Thread Joel Bernstein
This issue is resolved for Solr 6.4: https://issues.apache.org/jira/browse/SOLR-9919 I also created an issue to resolve future bugs of this nature: https://issues.apache.org/jira/browse/SOLR-9924 Thanks for the bug report! Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 3, 2017 at 9

Re: Random Streaming Function not there? SolrCloud 6.3.0

2017-01-03 Thread Joel Bernstein
Luckily https://issues.apache.org/jira/browse/SOLR-9103 is available in Solr 6.3 So you can register the random expression through the solrconfig. The ticket shows an example. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 3, 2017 at 7:59 PM, Joel Bernstein <joels...@gmail.com>

Re: Random Streaming Function not there? SolrCloud 6.3.0

2017-01-03 Thread Joel Bernstein
sure I remember testing random at scale through the /stream handler so I'm not sure how this missed getting committed. I will fix this for Solr 6.4. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 3, 2017 at 6:46 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > I'

Re: multiple order by clauses with the sql handler

2016-12-29 Thread Joel Bernstein
There are test cases with multiple order by fields for SELECT DISTINCT and GROUP BY. But not for simple SELECT. We are just about to release a new version of the SQL interface which uses Apache Calcite rather then Presto. I'll make sure that this is working in the new release. Joel Bernstein

Re: multiple order by clauses with the sql handler

2016-12-29 Thread Joel Bernstein
This would be a bug. I'll take a look at the test cases and see if there is a test case for this. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Dec 29, 2016 at 3:17 PM, radha krishnan <dradhakrishna...@gmail.com> wrote: > Hi, > > when i was trying out the SQL functional

Re: Solr on HDFS: Streaming API performance tuning

2016-12-19 Thread Joel Bernstein
/solrj/io/comp/FieldComparator.java Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Dec 19, 2016 at 4:43 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hi Joel, > > I don't have any solr documents that have NULL values for the sort fields I > use in my queries. > >

Re: Solr on HDFS: Streaming API performance tuning

2016-12-18 Thread Joel Bernstein
Ok, based on the stack trace I suspect one of your sort fields has NULL values, which in the 5x branch could produce null pointers if a segment had no values for a sort field. This is also fixed in the Solr 6x branch. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Dec 17, 2016 at 2:44 PM

Re: Solr on HDFS: Streaming API performance tuning

2016-12-16 Thread Joel Bernstein
The Streaming API may have been throwing exceptions because the JSON special characters were not escaped. This was fixed in Solr 6.0. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hello, > > I

Re: Deep dive on the topic() streaming expression

2016-12-13 Thread Joel Bernstein
of stress tests and I've never been able to make it happen. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Dec 13, 2016 at 11:22 AM, Joel Bernstein <joels...@gmail.com> wrote: > I plan on using this thread to address questions that were posted to &g

Deep dive on the topic() streaming expression

2016-12-13 Thread Joel Bernstein
are stream expressions robust enough to be used in production? 5) Is there any more deep dive documentation about topic(). I would love to know its stats for query volume as big as ours (9-10 million). Or, I would love to know how its working internally. Joel Bernstein http://joelsolr.blogspot.com/

Re: "on deck" searcher vs warming searcher

2016-12-09 Thread Joel Bernstein
like exceptions being thrown when searchers are opened too frequently. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Dec 9, 2016 at 5:42 PM, Trey Grainger <solrt...@gmail.com> wrote: > Shawn and Joel both answered the question with seemingly opposite answers, > but Joel's sho

Re: "on deck" searcher vs warming searcher

2016-12-09 Thread Joel Bernstein
: is an on-deck searcher a warming searcher: the answer is basically yes. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Dec 9, 2016 at 9:04 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 12/8/2016 6:08 PM, Brent wrote: > > Is there a difference between an &qu

Re: Solr 6.3.0 SQL question

2016-11-29 Thread Joel Bernstein
ses. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Nov 29, 2016 at 11:48 AM, Joe Obernberger < joseph.obernber...@gmail.com> wrote: > Just some data points. > main is an alias for the collection UNCLASS. > > 'stmt=SELECT TextSize from main LIMIT 10' fails > 's

Re: Solr 6.3.0 SQL question

2016-11-29 Thread Joel Bernstein
I'll take a look at the StatsStream and see what the issue is. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Nov 28, 2016 at 8:32 PM, Damien Kamerman <dami...@gmail.com> wrote: > Aggregated selects only work with lower-case collection names (and no > dashes). (Bug in StatsSt

Re: stream, features and train

2016-11-26 Thread Joel Bernstein
Hi, It looks like the outcome field my not be correct or it may have missing values. You'll need to populate this field for all records in the training set. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Nov 23, 2016 at 3:21 PM, Joe Obernberger < joseph.obernber...@gmail.com> wrote:

Re: Measuring the entropy of a field

2016-11-15 Thread Joel Bernstein
more useful algorithms. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Nov 15, 2016 at 10:31 AM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > Does Lucene/Solr include any tools for measuring the entropy/information > of a field? My intuition is that this wou

Re: Parallelize Cursor approach

2016-11-10 Thread Joel Bernstein
Solr 5 was very early days for Streaming Expressions. Streaming Expressions and SQL use Java 8 so development switched to the 6.0 branch five months before the 6.0 release. So there was a very large jump in features and bug fixes from Solr 5 to Solr 6 in Streaming Expressions. Joel Bernstein http

Re: Parallelize Cursor approach

2016-11-10 Thread Joel Bernstein
In Solr 5 the /export handler wasn't escaping json text fields, which would produce json parse exceptions. This was fixed in Solr 6.0. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Nov 8, 2016 at 6:17 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Hmm, that should work

Re: Basic Auth for Solr Streaming Expressions

2016-11-09 Thread Joel Bernstein
Thanks for digging into this, let's create a jira ticket for this. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Nov 9, 2016 at 6:23 PM, sandeep mukherjee < wiredcit...@yahoo.com.invalid> wrote: > I have more progress since my last mail. I figured out that in the > StreamCo

Re: High CPU Usage in export handler

2016-11-08 Thread Joel Bernstein
. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Nov 7, 2016 at 3:44 PM, Ray Niu <newry1...@gmail.com> wrote: > Hello: >Any follow up? > > 2016-11-03 11:18 GMT-07:00 Ray Niu <newry1...@gmail.com>: > > > the soft commit is 15 seconds and hard commit is

Re: UpdateProcessor as a batch

2016-11-03 Thread Joel Bernstein
This might be useful. In this scenario you load you content into Solr for staging and perform your ETL from Solr to Solr: http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html Basically Solr becomes a text processing warehouse. Joel Bernstein http://joelsolr.blogspot.com

Re: High CPU Usage in export handler

2016-11-03 Thread Joel Bernstein
Are you doing heavy writes at the time? How many concurrent reads are are happening? What version of Solr are you using? What is the field definition for the double, is it docValues? Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Nov 3, 2016 at 12:56 AM, Ray Niu <newry1...@gmail.

Re: Graph Traversal Question

2016-10-26 Thread Joel Bernstein
what relationships they wanted to traverse. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Oct 26, 2016 at 9:39 AM, Grant Ingersoll <gsing...@apache.org> wrote: > The other way to think about is: I want to put labels on the edges. In my > case, the label is the relations

Re: Graph Traversal Question

2016-10-25 Thread Joel Bernstein
author"], "level":1}, {"node":"Maria","collection":"reviews","field":"user_s"," ancestors":["book2"], "relationships":["author"], "level":1}, {"EOF":true,"RE

Re: /export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-20 Thread Joel Bernstein
ssions. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Oct 20, 2016 at 5:49 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hello, > > I am using /export handler to stream data using CloudSolrStream. > > I am using fl=uuid,space,timestamp where uuid and space are Stri

Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated?

2016-10-19 Thread Joel Bernstein
the group head. Recently the sort parameter was added to collapse, but this likely is not nearly as fast as using the min/max for selecting group heads. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Oct 19, 2016 at 7:20 PM, Joel Bernstein <joels...@gmail.com> wrote: > Originally c

Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated?

2016-10-19 Thread Joel Bernstein
, is to do it in a way that does not hurt the original performance goal of collapse. Otherwise we'll be back to just have slow grouping. Perhaps the new API's that are being worked could have a facade over grouping and collapsing so they would share the same API. Joel Bernstein http

Re: Stream expressions: Break up multivalue field into usable tuples

2016-10-08 Thread Joel Bernstein
systems. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Oct 8, 2016 at 8:54 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Joel -- thanks! Got this working and now feel in a better shape to grok > what's happening > > Out of curiosity, is th

Re: Streaming api and multiValued fields

2016-10-06 Thread Joel Bernstein
Currently the joins in the Streaming API don't support joining on multi-value fields. It will be difficult to support merge joins on multi-value fields but hash joins would be possible in the future. Also the gatherNodes graph expression will support multi-value fields in the future. Joel

Re: Solr 6.2 Distributed joins

2016-10-05 Thread Joel Bernstein
interface is so important. Currently it doesn't support joins, but when it does, it will build the proper streaming expression to do the relational algebra. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Oct 5, 2016 at 11:44 AM, Gurdeep Singh <gurdeep

Re: How to use StreamingApi MultiFieldComparator?

2016-10-03 Thread Joel Bernstein
Ok, I'll test this out. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Oct 3, 2016 at 4:40 AM, Markko Legonkov <maxl...@gmail.com> wrote: > here is the stacktrace > > java.io.IOException: Unable to construct instance of > org.apache.solr.client.solrj.io.strea

Re: How to use StreamingApi MultiFieldComparator?

2016-10-01 Thread Joel Bernstein
Also you'll probably need to specify the /export handler in the search expressions, so you get the entire result set. qt="/export" Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Oct 1, 2016 at 6:08 PM, Joel Bernstein <joels...@gmail.com> wrote: > Ok, I

Re: How to use StreamingApi MultiFieldComparator?

2016-10-01 Thread Joel Bernstein
Ok, I took a closer look at the expression. I believe this is not supported: sale_price_d!=c_sale_price_d Possibly the complement expression might accomplish what you're trying to do. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Oct 1, 2016 at 5:59 PM, Joel Bernstein <jo

Re: How to use StreamingApi MultiFieldComparator?

2016-10-01 Thread Joel Bernstein
Hi can you attach the stack traces in the logs? I'd like to see where this exception coming, this appears to be a bug. I'll also need to dig into your expression and see if there is an issue with the syntax. Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Oct 1, 2016 at 2:29 PM, Markko

Re: Whether solr can support 2 TB data?

2016-09-23 Thread Joel Bernstein
will need to experiment with your document set and performance requirements to find your optimal shard size. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 23, 2016 at 5:16 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 9/23/2016 2:33 PM, Jeffery Yuan wrote: > > In

Re: Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Joel Bernstein
et open to have scoreNodes operate directly on the facet() function so you don't have to deal with the select() function. https://issues.apache.org/jira/browse/SOLR-9537. I'd like to get to this soon. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 22, 2016 at 5:02 PM, Doug Turnbul

Re: Hackday next month

2016-09-21 Thread Joel Bernstein
This is great, I'll try to make it up to Boston a day earlier for this. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Sep 21, 2016 at 8:17 AM, Charlie Hull <char...@flax.co.uk> wrote: > Hi all, > > If you're coming to Lucene Revolution next month in Boston, we're run

Re: [Rerank Query] Distributed search + pagination

2016-09-19 Thread Joel Bernstein
Alessandro, I'll be doing some testing with the re-ranker as part of SOLR-9403 for Solr 6.3. I'll see if I can better understand the issue you're bringing up during the testing. I'll report back to this thread after I've done some testing. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep

Re: Best way to generate multivalue fields from streaming API

2016-09-16 Thread Joel Bernstein
would make a good jira issue. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Sep 16, 2016 at 11:03 AM, Mike Thomsen <mikerthom...@gmail.com> wrote: > Read this article and thought it could be interesting as a way to do > ingestion: > > https://dzone.com/article

Re: SQL Joins in Parallel SQL Interface

2016-09-14 Thread Joel Bernstein
-with-Streaming-Expressions-LeftOuterJoin-td4290526.html Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Sep 14, 2016 at 6:19 PM, Aswath Srinivasan (TMS) < aswath.sriniva...@toyota.com> wrote: > Hello, > > I'm exploring the Parallel SQL. I don't see any SQL JOIN featu

Re: Miserable Experience ..... Again.

2016-09-12 Thread Joel Bernstein
I'm currently working on upgrading Alfresco from Solr 6.0 to Solr 6.2. Should be easy. Think again. Lucene analyzer changes between Solr 6.0 and Solr 6.2 and a new assert in ConjunctionDISI have caused days of work to perform this simple upgrade. Joel Bernstein http://joelsolr.blogspot.com

Re: [Rerank Query] Distributed search + pagination

2016-09-09 Thread Joel Bernstein
I'm not understanding where the inconsistency comes into play. The re-ranking occurs on the shards. The aggregator node will be sent some docs that have been re-scored and others that are not. But the sorting should be the same as someone pages through the result set. Joel Bernstein http

Re: Solr Grouping, Aggregations and Custom Functions

2016-09-08 Thread Joel Bernstein
Parallel SQL only supports the following functions currently: (SUM, AVG, MIN, MAX, COUNT). More functions and compound functions are on the roadmap. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 8, 2016 at 12:11 AM, Praveen Babu <subramani@gmail.com> wrote: > Hi All,

Re: Solr [Streaming Expressions/Parallel SQL Interface] Not supporting Multi Value using mapReduce option

2016-09-08 Thread Joel Bernstein
technique. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 8, 2016 at 2:24 AM, Praveen Babu <subramani@gmail.com> wrote: > Hi , > > > Does parallel sql support Multi valued field? > > I am unable to group by on Multi valued field when I choose > > /sql?aggre

Re: Streaming expression in solr doesnot support collection alias

2016-09-08 Thread Joel Bernstein
Getting aliases working is a high priority and fairly easy to do. We should have this in for Solr 6.3. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Sep 8, 2016 at 3:18 AM, Tali Finelt <tal...@il.ibm.com> wrote: > Hi All, > > We saw there is an open issue regarding this s

<    1   2   3   4   5   6   7   8   >