[GitHub] [incubator-druid] vogievetsky commented on issue #8823: Add InputSource and InputFormat interfaces

2019-11-04 Thread GitBox
vogievetsky commented on issue #8823: Add InputSource and InputFormat interfaces
URL: https://github.com/apache/incubator-druid/pull/8823#issuecomment-549699847
 
 
   This is amazing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson opened a new pull request #8823: Add InputSource and InputFormat interfaces

2019-11-04 Thread GitBox
jihoonson opened a new pull request #8823: Add InputSource and InputFormat 
interfaces
URL: https://github.com/apache/incubator-druid/pull/8823
 
 
   ### Description
   
   This is the First PR for 
https://github.com/apache/incubator-druid/issues/8812 which includes the new 
interfaces proposed in #8812. A couple of implementations are also included 
such as `LocalInputSource` and `HttpInputSource` for `InputSource`, and 
`CsvInputFormat` and `JsonInputFormat` for `InputFormat`.
   
   The old `firehose` and `parser` parameters should still work, but you cannot 
mix them. Only the combinations of `firehose` + `parser` or `inputSource`  + 
`inputFormat` are allowed.
   
   Documents will be added after more inputSources and inputFormats are 
implemented in follow-up PRs.
   
   
   
   This PR has:
   - [x] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/incubator-druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/incubator-druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] stale[bot] commented on issue #6922: Authorizer is never executed after passing from Authenticator

2019-11-04 Thread GitBox
stale[bot] commented on issue #6922: Authorizer is never executed after passing 
from Authenticator
URL: 
https://github.com/apache/incubator-druid/issues/6922#issuecomment-549661674
 
 
   This issue has been marked as stale due to 280 days of inactivity. It will 
be closed in 4 weeks if no further activity occurs. If this issue is still 
relevant, please simply write any comment. Even if closed, you can still revive 
the issue at any time or discuss it on the d...@druid.apache.org list. Thank 
you for your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] suneet-amp commented on issue #8822: optimize numeric column null value checking for low filter selectivity (more rows)

2019-11-04 Thread GitBox
suneet-amp commented on issue #8822: optimize numeric column null value 
checking for low filter selectivity (more rows)
URL: https://github.com/apache/incubator-druid/pull/8822#issuecomment-549657961
 
 
   The heatmaps look super cool! (although I don't think I fully understand 
them yet :| ) What did you use to build them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis commented on issue #6823: Use an enum for aggregator cache key byte identifiers instead of static values

2019-11-04 Thread GitBox
clintropolis commented on issue #6823: Use an enum for aggregator cache key 
byte identifiers instead of static values
URL: 
https://github.com/apache/incubator-druid/issues/6823#issuecomment-549650904
 
 
   Still relevant


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] stale[bot] commented on issue #6823: Use an enum for aggregator cache key byte identifiers instead of static values

2019-11-04 Thread GitBox
stale[bot] commented on issue #6823: Use an enum for aggregator cache key byte 
identifiers instead of static values
URL: 
https://github.com/apache/incubator-druid/issues/6823#issuecomment-549650919
 
 
   This issue is no longer marked as stale.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] stale[bot] commented on issue #6823: Use an enum for aggregator cache key byte identifiers instead of static values

2019-11-04 Thread GitBox
stale[bot] commented on issue #6823: Use an enum for aggregator cache key byte 
identifiers instead of static values
URL: 
https://github.com/apache/incubator-druid/issues/6823#issuecomment-549640247
 
 
   This issue has been marked as stale due to 280 days of inactivity. It will 
be closed in 4 weeks if no further activity occurs. If this issue is still 
relevant, please simply write any comment. Even if closed, you can still revive 
the issue at any time or discuss it on the d...@druid.apache.org list. Thank 
you for your contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis opened a new pull request #8822: optimize numeric column null value checking for low filter selectivity (more rows)

2019-11-04 Thread GitBox
clintropolis opened a new pull request #8822: optimize numeric column null 
value checking for low filter selectivity (more rows)
URL: https://github.com/apache/incubator-druid/pull/8822
 
 
   ### Description
   
   I would like to add documentation for Druid SQL compatible null handling 
added in #4349 to the website, in part because it has been merged for quite a 
while now, but also so that we can feel better about the changes #8566 causes 
in SQL behavior in Druids native mode (tl;dr some calcite optimizations lead to 
some query incompatibilities specifically around our treatment of `''` and 
`null` as equivalent).
   
   Part of like.. being responsible and stuff, before adding this documentation 
I first wanted to collect some data to determine if it _was a good idea to in 
fact document it at this time_. The place I was specifically worried about was 
the `isNull` check, so I added a benchmark, 
`NullHandlingBitmapGetVsIteratorBenchmark`,  and ran some experiments to see 
what to expect as well as if we could do better.
   
   The nature of selectors lends itself to using a bitmap iterator as an 
alternative to calling bitmap.get for every `isNull` check, so I first tested 
`get` vs `iterator` with various null bitmap densities and filter selectivities 
to simulate the overhead of having null value handling selector with each 
approach. I have so far only collected information on roaring, because with 
`concise` the `get` method would not complete at low selectivity, so I'm 
assuming it's numbers are quite bad (and more on this later).
   
   To help make sense of the numbers collected from this, I created heatmaps to 
compare the differences between the approaches. With the raw output, these 
looked something like this which isn't so telling:
   
   
![raw](https://user-images.githubusercontent.com/1577461/68170194-b52bde00-ff23-11e9-9e84-2047cc67df72.gif)
   
   Interpolating the data to fill in the gaps yields something a bit prettier, 
but still not incredibly telling on it's own:
   
![difference-iterator-get](https://user-images.githubusercontent.com/1577461/68170242-dbea1480-ff23-11e9-923e-67872816a1a3.gif)
   But does start to offer that the iterator is better at dealing with a larger 
number of rows, diminishing as the density of the bitmap increases.
   
   Scaling the data points to estimate the _cost per row_ in nanoseconds 
provides a much more useful picture, and shows the abysmal performance of using 
the iterator at super high selectivity (e.g. 0.1% of rows match and are 
selected) with really dense bitmaps:
   
![difference-per-row-iterator-get](https://user-images.githubusercontent.com/1577461/68170352-426f3280-ff24-11e9-9782-7a47892f0f18.gif)
   However if we truncate the results and compare areas where the `iterator` is 
better:
   
![difference-per-row-iterator-better](https://user-images.githubusercontent.com/1577461/68170371-53b83f00-ff24-11e9-989e-41ec0219d29c.gif)
   to where `get` is better
   
![difference-per-row-get-better](https://user-images.githubusercontent.com/1577461/68170380-5b77e380-ff24-11e9-8f4f-538004046187.gif)
   it does look like the iterator does perform up to 15 `ns per row` better 
than using `get` when processing a lot of rows.
   
   Searching for a way around the limitation, I was unaware that the 
`IntIterator` for roaring was actually a `PeekableIntIterator` which is an 
iterator that offers an `advanceIfNeeded` method that allows skipping the 
iterator ahead to an index. `ConciseSet` actually has a similar method on it's 
iterator, `skipAllBefore`, which _is used by_ the `contains` method that the 
`get` of concise uses! This is likely why concise just flat out wouldn't 
complete the benchmarks when processing a higher number of rows, because every 
`get` is creating an iterator, skipping to the position, loop checking to see 
if the iterator contains the index or passes it, and then throwing it away.
   
   Adding the 'peekable' iterator to the benchmark had a similar outcome to 
using the plain iterator, 
   
   
![difference-peekable-iterator-get](https://user-images.githubusercontent.com/1577461/68170721-c8d84400-ff25-11e9-9c10-47d5ebf40933.gif)
   
   but without nearly the overhead at high selectivity on dense bitmaps. 
   
![difference-per-row-peekable-iterator-get](https://user-images.githubusercontent.com/1577461/68170735-d392d900-ff25-11e9-8476-7bb2728fe43b.gif)
   It's still worse, but not nearly as bad, no more than 60ns slower per row 
when processing a small number of rows than get, compared to over 2 
microseconds for using the plain iterator.
   
   Peekable iterator better:
   
   
![difference-per-row-peekable-iterator-better](https://user-images.githubusercontent.com/1577461/68170780-f91fe280-ff25-11e9-86b8-6ff026d4813e.gif)
   
   Get better than peekable iterator:
   
![difference-per-row-peekable-iterator-get-better](https://user-images.githubusercontent.com/1577461/68170798-01781d80-ff26-11e9-9daf-ca95788bdb71.gif)
   
   F

[GitHub] [incubator-druid] ccaominh commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
ccaominh commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342339664
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(9);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+KafkaIndexTask task = captured.getValue();
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(0).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(1).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(2).longValue()
+);
+
+addMoreEvents(9, 6);
+Thread.sleep(1);
 
 Review comment:
   The GitHub comment is useful, but having it in the source code is what I had 
in mind.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] SEKIRO-J commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
SEKIRO-J commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342334184
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(9);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+KafkaIndexTask task = captured.getValue();
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(0).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(1).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(2).longValue()
+);
+
+addMoreEvents(9, 6);
+Thread.sleep(1);
 
 Review comment:
   https://github.com/apache/incubator-druid/pull/8748#issuecomment-548955564


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] SEKIRO-J commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
SEKIRO-J commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342334184
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(9);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+KafkaIndexTask task = captured.getValue();
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(0).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(1).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(2).longValue()
+);
+
+addMoreEvents(9, 6);
+Thread.sleep(1);
 
 Review comment:
   https://github.com/apache/incubator-druid/pull/8748#discussion_r342276353


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] ccaominh commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
ccaominh commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342276331
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
 
 Review comment:
   Suggestion: Rename test method to something like 
`testAlwaysUsesEarliestOffsetForNewlyDiscoveredPartitions` and get rid of the 
method javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] ccaominh commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
ccaominh commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342281423
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(9);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
 
 Review comment:
   FYI, for this use case, using `andStubReturn()` may be simpler than doing 
`andReturn().anyTimes()`:
   
https://stackoverflow.com/questions/34233447/is-there-any-difference-between-andreturn-anytimes-and-andstubreturn/34477207


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] ccaominh commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
ccaominh commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342276659
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/test/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisorTest.java
 ##
 @@ -585,6 +589,108 @@ public void testLatestOffset() throws Exception
 );
   }
 
+  /**
+   * Test if partitionIds get updated
+   */
+  @Test
+  public void testPartitionIdsUpdates() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(1100);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+Assert.assertFalse(supervisor.isPartitionIdsEmpty());
+  }
+
+
+  /**
+   * Test For if always use earliest offset on newly discovered partitions
+   */
+  @Ignore("This is a regression test that needs to wait 10s+, ignore for now")
+  @Test
+  public void testLatestOffsetOnDiscovery() throws Exception
+  {
+supervisor = getTestableSupervisor(1, 1, false, "PT1H", null, null);
+addSomeEvents(9);
+
+Capture captured = Capture.newInstance();
+
EasyMock.expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
+
EasyMock.expect(taskMaster.getTaskRunner()).andReturn(Optional.absent()).anyTimes();
+
EasyMock.expect(taskStorage.getActiveTasksByDatasource(DATASOURCE)).andReturn(ImmutableList.of()).anyTimes();
+
EasyMock.expect(indexerMetadataStorageCoordinator.getDataSourceMetadata(DATASOURCE)).andReturn(
+new KafkaDataSourceMetadata(
+null
+)
+).anyTimes();
+EasyMock.expect(taskQueue.add(EasyMock.capture(captured))).andReturn(true);
+replayAll();
+supervisor.start();
+supervisor.runInternal();
+verifyAll();
+
+KafkaIndexTask task = captured.getValue();
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(0).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(1).longValue()
+);
+Assert.assertEquals(
+10,
+
task.getIOConfig().getStartSequenceNumbers().getPartitionSequenceNumberMap().get(2).longValue()
+);
+
+addMoreEvents(9, 6);
+Thread.sleep(1);
 
 Review comment:
   Adding a comment explaining why you need the 10 second sleep here is helpful


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] ccaominh commented on a change in pull request #8748: Use earliest offset on kafka newly discovered partitions

2019-11-04 Thread GitBox
ccaominh commented on a change in pull request #8748: Use earliest offset on 
kafka newly discovered partitions
URL: https://github.com/apache/incubator-druid/pull/8748#discussion_r342276353
 
 

 ##
 File path: 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisor.java
 ##
 @@ -156,9 +156,12 @@ protected void scheduleReporting(ScheduledExecutorService 
reportingExec)
 
 
   @Override
-  protected int getTaskGroupIdForPartition(Integer partition)
+  protected int getTaskGroupIdForPartition(Integer partitionId)
   {
-return partition % spec.getIoConfig().getTaskCount();
+if (!partitionIds.contains(partitionId)) {
 
 Review comment:
   Can you add a comment explaining how this change prevents the original bug?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] vogievetsky commented on issue #8777: Web console: Interval input component

2019-11-04 Thread GitBox
vogievetsky commented on issue #8777: Web console: Interval input component
URL: https://github.com/apache/incubator-druid/pull/8777#issuecomment-549519351
 
 
   Could you restrict the input to only accept chars that are valid for 
intervals to have so so not allow this:
   
   
![image](https://user-images.githubusercontent.com/177816/68153006-9dd7fb00-fef9-11e9-80d0-28952d1942c9.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST aggregators.

2019-11-04 Thread GitBox
suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST 
aggregators.
URL: https://github.com/apache/incubator-druid/pull/8815#discussion_r342236463
 
 

 ##
 File path: 
sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestSqlAggregator.java
 ##
 @@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.aggregation.builtin;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.first.DoubleFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.FloatFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.LongFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.StringFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.last.DoubleLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.FloatLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.LongLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.StringLastAggregatorFactory;
+import 
org.apache.druid.query.aggregation.post.FinalizingFieldAccessPostAggregator;
+import org.apache.druid.segment.VirtualColumn;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+import org.apache.druid.sql.calcite.table.RowSignature;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class EarliestLatestSqlAggregator implements SqlAggregator
+{
+  public static final SqlAggregator EARLIEST = new 
EarliestLatestSqlAggregator(EarliestOrLatest.EARLIEST);
+  public static final SqlAggregator LATEST = new 
EarliestLatestSqlAggregator(EarliestOrLatest.LATEST);
+
+  enum EarliestOrLatest
+  {
+EARLIEST {
 
 Review comment:
   nit: leave a comment here reminding people not to rename the enum since the 
name() is used in the AggFunction below


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST aggregators.

2019-11-04 Thread GitBox
suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST 
aggregators.
URL: https://github.com/apache/incubator-druid/pull/8815#discussion_r342239233
 
 

 ##
 File path: 
sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/EarliestLatestSqlAggregator.java
 ##
 @@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.aggregation.builtin;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexLiteral;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.first.DoubleFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.FloatFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.LongFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.first.StringFirstAggregatorFactory;
+import org.apache.druid.query.aggregation.last.DoubleLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.FloatLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.LongLastAggregatorFactory;
+import org.apache.druid.query.aggregation.last.StringLastAggregatorFactory;
+import 
org.apache.druid.query.aggregation.post.FinalizingFieldAccessPostAggregator;
+import org.apache.druid.segment.VirtualColumn;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+import org.apache.druid.sql.calcite.table.RowSignature;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+public class EarliestLatestSqlAggregator implements SqlAggregator
+{
+  public static final SqlAggregator EARLIEST = new 
EarliestLatestSqlAggregator(EarliestOrLatest.EARLIEST);
+  public static final SqlAggregator LATEST = new 
EarliestLatestSqlAggregator(EarliestOrLatest.LATEST);
+
+  enum EarliestOrLatest
+  {
+EARLIEST {
+  @Override
+  AggregatorFactory createAggregatorFactory(String name, String fieldName, 
ValueType type, int maxStringBytes)
+  {
+switch (type) {
+  case LONG:
+return new LongFirstAggregatorFactory(name, fieldName);
+  case FLOAT:
+return new FloatFirstAggregatorFactory(name, fieldName);
+  case DOUBLE:
+return new DoubleFirstAggregatorFactory(name, fieldName);
+  case STRING:
+return new StringFirstAggregatorFactory(name, fieldName, 
maxStringBytes);
 
 Review comment:
   Do we want to validate that maxStringBytes >= 0 in both the aggregator 
factories? I traced through the code and I think an exception will be thrown in 
the String*BufferAggregator#aggregate because there will be an out of bounds 
exception.
   
   Also it's not clear to me what the expected result should be if 
maxStringBytes is 0


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

--

[GitHub] [incubator-druid] suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST aggregators.

2019-11-04 Thread GitBox
suneet-amp commented on a change in pull request #8815: SQL: EARLIEST, LATEST 
aggregators.
URL: https://github.com/apache/incubator-druid/pull/8815#discussion_r342235271
 
 

 ##
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##
 @@ -889,6 +896,123 @@ public void testGroupBySingleColumnDescendingNoTopN() 
throws Exception
 );
   }
 
+  @Test
+  public void testEarliestAggregators() throws Exception
+  {
+// Cannot vectorize EARLIEST aggregator.
+skipVectorize();
 
 Review comment:
   Does this need to be reset at the end of the test? Otherwise all the other 
tests that run after this will run with vectorize disabled? Or is that being 
handled by some rule that I'm not seeing in this delta


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] suneet-amp commented on a change in pull request #8814: SQL: Add RAND() function.

2019-11-04 Thread GitBox
suneet-amp commented on a change in pull request #8814: SQL: Add RAND() 
function.
URL: https://github.com/apache/incubator-druid/pull/8814#discussion_r342217314
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/expression/RandExprMacro.java
 ##
 @@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.expression;
+
+import com.google.common.collect.Iterables;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.segment.DimensionHandlerUtils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.function.DoubleSupplier;
+
+public class RandExprMacro implements ExprMacroTable.ExprMacro
+{
+  @Override
+  public String name()
+  {
+return "rand";
+  }
+
+  @Override
+  public Expr apply(List args)
+  {
+final DoubleSupplier randomGenerator;
+
+if (args.isEmpty()) {
+  randomGenerator = () -> ThreadLocalRandom.current().nextDouble();
 
 Review comment:
   If you wanted to make the test in ExpressionsTest.java easier, you could 
inject an annotated `Supplier` into this class something like
   
   ```
   @Inject
   RandExprMacro(@ThreadRandom Supplier 
localThreadRandomSupplier) {
   ...
   }
   ```
   
   As I'm typing this, it feels like overkill, so feel free to ignore


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] suneet-amp commented on a change in pull request #8814: SQL: Add RAND() function.

2019-11-04 Thread GitBox
suneet-amp commented on a change in pull request #8814: SQL: Add RAND() 
function.
URL: https://github.com/apache/incubator-druid/pull/8814#discussion_r342201369
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/expression/RandExprMacro.java
 ##
 @@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.expression;
+
+import com.google.common.collect.Iterables;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprEval;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.segment.DimensionHandlerUtils;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.function.DoubleSupplier;
+
+public class RandExprMacro implements ExprMacroTable.ExprMacro
+{
+  @Override
+  public String name()
+  {
+return "rand";
+  }
+
+  @Override
+  public Expr apply(List args)
+  {
+final DoubleSupplier randomGenerator;
+
+if (args.isEmpty()) {
+  randomGenerator = () -> ThreadLocalRandom.current().nextDouble();
+} else if (args.size() == 1) {
+  final Expr seedArg = Iterables.getOnlyElement(args);
+  if (seedArg.isLiteral()) {
+final Long seedValue = 
DimensionHandlerUtils.convertObjectToLong(seedArg.getLiteralValue());
+if (seedValue != null) {
+  final Random random = new Random(seedValue);
+  randomGenerator = random::nextDouble;
+} else {
+  throw new IAE("Function[%s] first argument must be a number 
literal");
+}
+  } else {
+throw new IAE("Function[%s] first argument must be a number literal");
+  }
+} else {
+  throw new IAE("Function[%s] needs zero or one arguments", name());
+}
+
+class RandExpr extends ExprMacroTable.BaseScalarMacroFunctionExpr
 
 Review comment:
   👀 I've never seen this pattern before.
   
   dumb guy question: Is there any advantage of using this vs a private static 
inner class. Do we do this just so we don't have to pass `randomGenerator` to 
the constructor of this class?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] zhenxiao commented on a change in pull request #8808: use copy-on-write list in InMemoryAppender

2019-11-04 Thread GitBox
zhenxiao commented on a change in pull request #8808: use copy-on-write list in 
InMemoryAppender
URL: https://github.com/apache/incubator-druid/pull/8808#discussion_r342215463
 
 

 ##
 File path: 
core/src/test/java/org/apache/druid/testing/junit/LoggerCaptureRule.java
 ##
 @@ -78,12 +77,13 @@ public void clearLogEvents()
   {
 static final String NAME = InMemoryAppender.class.getName();
 
-private final List logEvents;
+// need copy-on-write collection for thread safety of iteration and 
modification concurrently outside of a critical section
 
 Review comment:
   how about:
   logEvents has concurrent iteration and modification in 
CuratorModuleTest::exitsJvmWhenMaxRetriesExceeded(), needs to be thread safe


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jessp01 commented on issue #8305: Fix to always use end sequenceNumber for reset

2019-11-04 Thread GitBox
jessp01 commented on issue #8305: Fix to always use end sequenceNumber for reset
URL: https://github.com/apache/incubator-druid/pull/8305#issuecomment-549496846
 
 
   Hello all,
   
   I'm hitting the exact same issue with 0.15.1. Since upgrading to 0.16.0 
requires additional work, does anyone happen to have a built version of 0.15.1 
with these changes I can obtain and test? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] leventov commented on a change in pull request #8808: use copy-on-write list in InMemoryAppender

2019-11-04 Thread GitBox
leventov commented on a change in pull request #8808: use copy-on-write list in 
InMemoryAppender
URL: https://github.com/apache/incubator-druid/pull/8808#discussion_r342202395
 
 

 ##
 File path: 
core/src/test/java/org/apache/druid/testing/junit/LoggerCaptureRule.java
 ##
 @@ -78,12 +77,13 @@ public void clearLogEvents()
   {
 static final String NAME = InMemoryAppender.class.getName();
 
-private final List logEvents;
+// need copy-on-write collection for thread safety of iteration and 
modification concurrently outside of a critical section
 
 Review comment:
   This comment doesn't achieve its goal: to help somebody to see why the 
enclosing class needs to be thread-safe. With the current comment, in order to 
do that, the reader (developer) should invoke "Find usages" IDE action and 
analyze these usages carefully. This is time-consuming. In essence, there 
should be concurrent access documentation, as described in 
https://github.com/code-review-checklists/java-concurrency#justify-document.
   
   This comment is also misleading because there are no critical sections in 
this code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] jihoonson commented on issue #8821: Upgrade joda-time to 2.10.5

2019-11-04 Thread GitBox
jihoonson commented on issue #8821: Upgrade joda-time to 2.10.5
URL: https://github.com/apache/incubator-druid/pull/8821#issuecomment-549477950
 
 
   Seems like there was an issue with thrift repo. Restarted failed ones.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] vogievetsky commented on issue #8820: groupBy() query is not working in Node Js

2019-11-04 Thread GitBox
vogievetsky commented on issue #8820: groupBy() query is not working in Node Js
URL: 
https://github.com/apache/incubator-druid/issues/8820#issuecomment-549462718
 
 
   Could you post more details of the query and the error please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] gianm commented on issue #8814: SQL: Add RAND() function.

2019-11-04 Thread GitBox
gianm commented on issue #8814: SQL: Add RAND() function.
URL: https://github.com/apache/incubator-druid/pull/8814#issuecomment-549434724
 
 
   > Does it fully fix #8661?
   
   I believe the other functions already exist (check the docs) so yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[incubator-druid] branch master updated (49bd167 -> 511fa74)

2019-11-04 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git.


from 49bd167  serve web-console even if router management proxy is not 
enabled (#8797)
 add 511fa74  Move maxFetchRetry to FetchConfig; rename OpenObject (#8776)

No new revisions were added by this update.

Summary of changes:
 .../{PrefetchConfig.java => FetchConfig.java}  | 29 +++
 .../druid/data/input/impl/prefetch/Fetcher.java| 48 ++
 .../data/input/impl/prefetch/FileFetcher.java  | 27 --
 .../{OpenedObject.java => OpenObject.java} |  6 +--
 .../PrefetchableTextFilesFirehoseFactory.java  | 58 ++
 .../firehose/PrefetchSqlFirehoseFactory.java   | 40 +++
 .../segment/realtime/firehose/SqlFetcher.java  | 14 +++---
 7 files changed, 108 insertions(+), 114 deletions(-)
 rename 
core/src/main/java/org/apache/druid/data/input/impl/prefetch/{PrefetchConfig.java
 => FetchConfig.java} (75%)
 rename 
core/src/main/java/org/apache/druid/data/input/impl/prefetch/{OpenedObject.java 
=> OpenObject.java} (92%)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] gianm merged pull request #8776: Move maxFetchRetry to FetchConfig; rename OpenObject

2019-11-04 Thread GitBox
gianm merged pull request #8776: Move maxFetchRetry to FetchConfig; rename 
OpenObject
URL: https://github.com/apache/incubator-druid/pull/8776
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] a2l007 opened a new pull request #8821: Upgrade joda-time to 2.10.5

2019-11-04 Thread GitBox
a2l007 opened a new pull request #8821: Upgrade joda-time to 2.10.5
URL: https://github.com/apache/incubator-druid/pull/8821
 
 
   Upgrading joda time to include the latest timezone data.
   An important timezone change is that Brazil recently scrapped Daylight 
savings (https://www.timeanddate.com/news/time/brazil-scraps-dst.html) and this 
version includes that change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] Jaymin-adzmedia opened a new issue #8820: groupBy() query is not working in Node Js

2019-11-04 Thread GitBox
Jaymin-adzmedia opened a new issue #8820: groupBy() query is not working in 
Node Js
URL: https://github.com/apache/incubator-druid/issues/8820
 
 
   Hello,
   
   I need to get records by time. For that, I have used groupBy() query. TopN() 
query is working fine but once i use groupBy(). It is showing nodejs error.
   
   Nodejs: Internal server error: 5000 code.
   
   Can you help me to solve this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis edited a comment on issue #8578: parallel broker merges on fork join pool

2019-11-04 Thread GitBox
clintropolis edited a comment on issue #8578: parallel broker merges on fork 
join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-549253205
 
 
   ### more realistic worst case
   I reworked the JMH thread based benchmark to use thread groups to examine 
what happens in a more realistic scenario, with the newly renamed 
`ParallelMergeCombiningSequenceThreadedBenchmark`. I find this benchmark to be 
a fair bit less scary than the previous 'worst case' benchmarks, which focused 
on an impossible scenario because I really wanted to dig in and see where and 
how the wheels fell off.
   
   This benchmark models a more 'typical' heavy load, where the majority of the 
queries are smaller result-sets with shorter blocking times and a smaller 
subset are larger result sets with longer initial blocking times. By using 
thread groups we can look at performance for these 'classes' of queries as load 
increases. 
   
   This set was collected with a ratio of 1 'moderately large' query for every 
8 'small' queries, where 'moderately large' is defined as input sequence row 
counts of 50k-75k rows and blocking for 1-2.5 seconds before yielding results, 
and 'small' is defined as input sequence row counts of 500-10k and blocking for 
50-200ms. Keep in mind while reviewing the result that I collected data on a 
significantly higher level of parallelism than I would expect a 16 core machine 
to be realistically configured to handle. I would probably configure an m5.8xl 
with no more than 64 http threads, but collected data points up to 128 
concurrent sequences being processed just to see where things went.
   
   The first plot shows the merge time (y axis) growth as concurrency (x axis) 
increases, animated to show the differences for a given number of input 
sequences (analagous to cluster size).
   
   
![thread-groups-typical-distribution-1-8-small](https://user-images.githubusercontent.com/1577461/68105759-6125e880-fe94-11e9-86a4-cae8fb52b92b.gif)
   
   Note that the x axis is the _total_ concurrency count, not the number of 
threads of this particular group. Also worth pointing out is that the 
degradation of performance happens at a significantly higher level of 
concurrency than the previous (unrealistic) worse case performance, but in 
terms of characteristics, it does share some aspects with the previous plots, 
such as 8 input sequences being a lot more performant than say 64, and after a 
certain threshold, the performance of the parallel approach crosses the limit 
of the same threaded serial merge approach.
   
   
   The larger 'queries' tell a similar tale:
   
   
![thread-groups-typical-distribution-1-8-moderately-large](https://user-images.githubusercontent.com/1577461/68106055-4142f480-fe95-11e9-897b-57c7cf8b4ace.gif)
   
   The differences here when the parallel merge sequence crosses the threshold 
look to me a fair bit less dramatic than the 'small' sequences, but keep in 
mind the 'big jump' in the small sequences only amount to a few hundred 
milliseconds, so it's not quite as dramatic as it appears.
   
   The final plot shows the overall average between both groups:
   
   
![thread-groups-typical-distribution-1-8-average](https://user-images.githubusercontent.com/1577461/68105727-46ec0a80-fe94-11e9-9854-aaae9d8405c7.gif)
   
   which I find a bit less useful than the other 2 plots, but included anyway 
for completeness.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] vogievetsky opened a new pull request #8819: Web console: show hollow circle when unavailable

2019-11-04 Thread GitBox
vogievetsky opened a new pull request #8819: Web console: show hollow circle 
when unavailable
URL: https://github.com/apache/incubator-druid/pull/8819
 
 
   Show a hollow circle for the special state when a data source is totally 
unavailable
   
   
![image](https://user-images.githubusercontent.com/177816/68107208-f1662c80-fe98-11e9-978d-7d0ebc0813a7.png)
   
   It makes sense for it to have a special state.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] clintropolis commented on issue #8578: parallel broker merges on fork join pool

2019-11-04 Thread GitBox
clintropolis commented on issue #8578: parallel broker merges on fork join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-549253205
 
 
   ### more realistic worst case
   I reworked the JMH thread based benchmark to use thread groups to examine 
what happens in a more realistic scenario, with the newly renamed 
`ParallelMergeCombiningSequenceThreadedBenchmark`. I find this benchmark to be 
a fair bit less scary than the previous 'worst case' benchmarks, which focused 
on an impossible scenario because I really wanted to dig in and see where and 
how the wheels fell off.
   
   This benchmark models a more 'typical' heavy load, where the majority of the 
queries are smaller result-sets with shorter blocking times and a smaller 
subset are larger result sets with longer initial blocking times. By using 
thread groups we can look at performance for these 'classes' of queries as load 
increases. 
   
   This set was collected with a ratio of 1 'moderately large' query for every 
8 'small' queries, where 'moderately large' is defined as input sequence row 
counts of 50k-75k rows and blocking for 1-2.5 seconds before yielding results, 
and 'small' is defined as input sequence row counts of 500-10k and blocking for 
50-200ms. Keep in mind while reviewing the result that I collected data on a 
significantly higher level of parallelism than I would expect a 16 core machine 
to be realistically configured to handle. I would probably configure an m5.8xl 
with ~64 http threads, but collected data points up to 128 concurrent sequences 
being processed.
   
   The first plot shows the merge time (y axis) growth as concurrency (x axis) 
increases, animated to show the differences for a given number of input 
sequences (analagous to cluster size).
   
   
![thread-groups-typical-distribution-1-8-small](https://user-images.githubusercontent.com/1577461/68105759-6125e880-fe94-11e9-86a4-cae8fb52b92b.gif)
   
   Note that the x axis is the _total_ concurrency count, not the number of 
threads of this particular group. Also worth pointing out is that the 
degradation of performance happens at a significantly higher level of 
concurrency than the previous (unrealistic) worse case performance, but in 
terms of characteristics, it does share some aspects with the previous plots, 
such as 8 input sequences being a lot more performant than say 64, and after a 
certain threshold, the performance of the parallel approach crosses the limit 
of the same threaded serial merge approach.
   
   
   The larger 'queries' tell a similar tale:
   
   
![thread-groups-typical-distribution-1-8-moderately-large](https://user-images.githubusercontent.com/1577461/68106055-4142f480-fe95-11e9-897b-57c7cf8b4ace.gif)
   
   The differences here when the parallel merge sequence crosses the threshold 
look to me a fair bit less dramatic than the 'small' sequences, but keep in 
mind the 'big jump' in the small sequences only amount to a few hundred 
milliseconds, so it's not quite as dramatic as it appears.
   
   The final plot shows the overall average between both groups:
   
   
![thread-groups-typical-distribution-1-8-average](https://user-images.githubusercontent.com/1577461/68105727-46ec0a80-fe94-11e9-9854-aaae9d8405c7.gif)
   
   which I find a bit less useful than the other 2 plots, but included anyway 
for completeness.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org