[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-08-12 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176507#comment-17176507
 ] 

Ishan Chattopadhyaya commented on SOLR-10317:
-

bq. Ishan ChattopadhyayaDavid SmileyI'm using the Icecat dataset 
(https://github.com/querqy/chorus#sample-data-details) for a project. Today we 
have lots of attributed products (over 8000 attributes across the roughly 100K) 
products. We're working on adding actual price data to this dataset, which is 
currently doesn't have, and then I'm mulling over some ideas to generate 
reasonable queries (and judgement lists) to use with this dataset at scale.

[~epugh], I'll give it a try. Thanks for the tip!

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-08-12 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176506#comment-17176506
 ] 

Ishan Chattopadhyaya commented on SOLR-10317:
-

Thanks for the feedback, [~mdrob]. I'll address all your suggestions for 
downloads, esp the one about the verification.
As for point 7, chmod errors are unexpected. I'll take a look.
As for point 8, the final results are worth comparing against the same test run 
on another build or another commit point.
As for point 9, indexing threads correspond to how many client threads are used 
in indexing the documents, query threads are essentially a measure of 
concurrency while making requests. Increasing query threads helps gauge the 
performance under high concurrent load.

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-08-12 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176504#comment-17176504
 ] 

Mike Drob commented on SOLR-10317:
--

Ishan,

I tried to use the benchmarking framework you have put up to test 8.6.1 RC and 
ran into a lot of difficulty. It was not a great experience for me. I tried to 
go with the {{config-prebuilt.json}}
 # Please provide {{mvnw}} or {{gradlew}} for ease of getting started
 # (minor) Please address your maven build warnings
 # When it's prebuilt, I don't want to download a JDK. I deleted this from my 
local copy.
 # Don't download zookeeper directly from {{archive.apache.org}}, please use a 
mirror.
 # Start script downloads zookeeper 3.5.6, but then the first thing the java 
program does is to download zookeeper 3.4.14
 # If you're going to download things, then verify checksums and signatures. I 
generally don't trust anything that I download from the internet, and I do not 
appreciate how this repository is very lax about trust.
 # I get a flood of chmod errors when I try to start.
 # I get some "final results" as JSON but it is not at all clear to me how to 
compare these with other results.
 # In the config, I see a lot of options for tweaking threads, but it is not 
clear what these values correspond to, or what I will actually be testing if I 
change them.

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-07-06 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151944#comment-17151944
 ] 

David Eric Pugh commented on SOLR-10317:


[~ichattopadhyaya][~dsmiley]I'm using the Icecat dataset 
(https://github.com/querqy/chorus#sample-data-details) for a project.   Today 
we have lots of attributed products (over 8000 attributes across the roughly 
100K) products.   We're working on adding actual price data to this dataset, 
which is currently doesn't have, and then I'm mulling over some ideas to 
generate reasonable queries (and judgement lists) to use with this dataset at 
scale.   

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-07-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151723#comment-17151723
 ] 

David Smiley commented on SOLR-10317:
-

Thanks for re-opening. I have my heart set on "Solr nightly benchmarks" (the 
title!) becoming a reality some day. It's been a journey with an important 
milestone reached but we're still not there yet.

For "e-commerce centric data sets", maybe ask on solr-user; I'm not sure. 
Wikipedia has certain strengths but _perhaps_ not enough "structure" for some 
more interesting tests.  Geonames has decent structured data.

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> The benchmarking suite is now here: 
> [https://github.com/thesearchstack/solr-bench]
> Actual datasets and queries are TBD yet.
>  
> --- Original description ---
>  Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, [https://home.apache.org/~mikemccand/lucenebench/].
>  
>  Preferably, we need:
>  # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
>  # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
>  # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
>  
>  There is some prior work / discussion:
>  # [https://github.com/shalinmangar/solr-perf-tools] (Shalin)
>  # [https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md] 
> (Ishan/Vivek)
>  # SOLR-2646 & SOLR-9863 (Mark Miller)
>  # [https://home.apache.org/~mikemccand/lucenebench/] (Mike McCandless)
>  # [https://github.com/lucidworks/solr-scale-tk] (Tim Potter)
>  
>  There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
>  
>  Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~[markrmil...@gmail.com|mailto:markrmil...@gmail.com]] 
> would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-10317) Solr Nightly Benchmarks

2020-07-03 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151183#comment-17151183
 ] 

David Smiley commented on SOLR-10317:
-

Closing this seems premature if there are no benchmarks being run automatically 
for us to simply view at a URL.  Am I missing something?  Perhaps the issue 
description should be updated to have the end-product(s) / URLs, and then a 
horizontal line for the current description.  The current description starts 
with a link to something that doesn't exist.

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> Currently hosted at: http://212.47.242.214/MergedViewCloud.html
> 
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, https://home.apache.org/~mikemccand/lucenebench/.
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
> # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
> There is some prior work / discussion:
> # https://github.com/shalinmangar/solr-perf-tools (Shalin)
> # https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md 
> (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
> # https://github.com/lucidworks/solr-scale-tk (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~markrmil...@gmail.com] would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org