[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-15 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335160549
 
 

 ##
 File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java
 ##
 @@ -59,13 +63,21 @@ public static Appenderator newAppenderator(
   TaskToolbox toolbox,
   DataSchema dataSchema,
   AppenderatorConfig appenderatorConfig,
+  FirehoseFactory firehoseFactory,
   DataSegmentPusher segmentPusher
   )
   {
+final boolean isReingest;
+if (firehoseFactory instanceof IngestSegmentFirehoseFactory) {
+  isReingest = 
dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) 
firehoseFactory).getDataSource());
+} else {
+  isReingest = false;
+}
 
 Review comment:
   For now, I think this config is best left undocumented and for auto 
compaction internal usage only.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-15 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335159299
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
+{
+  private final PartitionsSpec partitionsSpec;
+  // org.apache.druid.segment.IndexSpec cannot be used here to avoid the 
dependency cycle
 
 Review comment:
   got it, thanks.
   
   maybe in next round of module merge: merge core into processing if there is 
no use case of anyone depending on druid-core directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-15 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335159299
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
+{
+  private final PartitionsSpec partitionsSpec;
+  // org.apache.druid.segment.IndexSpec cannot be used here to avoid the 
dependency cycle
 
 Review comment:
   got it, thanks.
   
   maybe in next round of module merge: merge core into processing if there is 
no use case of anyone depending on druid-core directly :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-15 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r335158395
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
 
 Review comment:
   LGTM, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334211148
 
 

 ##
 File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java
 ##
 @@ -59,13 +63,21 @@ public static Appenderator newAppenderator(
   TaskToolbox toolbox,
   DataSchema dataSchema,
   AppenderatorConfig appenderatorConfig,
+  FirehoseFactory firehoseFactory,
   DataSegmentPusher segmentPusher
   )
   {
+final boolean isReingest;
+if (firehoseFactory instanceof IngestSegmentFirehoseFactory) {
+  isReingest = 
dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) 
firehoseFactory).getDataSource());
+} else {
+  isReingest = false;
+}
 
 Review comment:
   yeah , that would work and let user (in this case auto compaction code) 
explicitly say whether CompactionState should be saved or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334167505
 
 

 ##
 File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/BatchAppenderators.java
 ##
 @@ -59,13 +63,21 @@ public static Appenderator newAppenderator(
   TaskToolbox toolbox,
   DataSchema dataSchema,
   AppenderatorConfig appenderatorConfig,
+  FirehoseFactory firehoseFactory,
   DataSegmentPusher segmentPusher
   )
   {
+final boolean isReingest;
+if (firehoseFactory instanceof IngestSegmentFirehoseFactory) {
+  isReingest = 
dataSchema.getDataSource().equals(((IngestSegmentFirehoseFactory) 
firehoseFactory).getDataSource());
+} else {
+  isReingest = false;
+}
 
 Review comment:
   is it possible to drive this from auto compaction code in druid coordinator 
instead as `IngestSegmentFirehoseFactory` could be used outside of auto 
compaction as well. For example, as a user , knowing my data flow, I can setup 
re-index task to run every day for previous day's data .. sort of a manual 
compaction. But in that case, the `CompactionState` doesn't need to be 
preserved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334163948
 
 

 ##
 File path: docs/configuration/index.md
 ##
 @@ -786,8 +786,7 @@ A description of the compaction config is:
 |`dataSource`|dataSource name to be compacted.|yes|
 |`taskPriority`|[Priority](../ingestion/tasks.html#priority) of compaction 
task.|no (default = 25)|
 |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per 
compaction task. Since a time chunk must be processed in its entirety, if the 
segments for a particular time chunk have a total size in bytes greater than 
this parameter, compaction will not run for that time chunk. Because each 
compaction task runs with a single thread, setting this value too far above 
1–2GB will result in compaction tasks taking an excessive amount of time.|no 
(default = 419430400)|
-|`targetCompactionSizeBytes`|The target segment size, for each segment, after 
compaction. The actual sizes of compacted segments might be slightly larger or 
smaller than this value. Each compaction task may generate more than one output 
segment, and it will try to keep each output segment close to this configured 
size. This configuration cannot be used together with `maxRowsPerSegment`.|no 
(default = 419430400)|
 
 Review comment:
   we should probably add a blurb in release notes for this , just in case some 
people set this property and expect something to happen.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334163388
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
+{
+  private final PartitionsSpec partitionsSpec;
+  // org.apache.druid.segment.IndexSpec cannot be used here to avoid the 
dependency cycle
 
 Review comment:
   couldn't understand what is the cycle ? .. do you mean `IndexSpec` can 
contain a `CompactionState` , so json serde would fail ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334162197
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
 
 Review comment:
   Can you please add javadoc for this class describing what this is, why it 
has the field it does ...(I know there is some discussion in proposal, but it 
would be very non-obvious for someone reading the code) and what guarantees it 
provides ... e.g. something like if a  `CompactionTask` is run with parameters 
matching here then row distribution in segments created would be exactly same.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-10-11 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r334162197
 
 

 ##
 File path: core/src/main/java/org/apache/druid/timeline/CompactionState.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.timeline;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
+
+import java.util.Map;
+import java.util.Objects;
+
+public class CompactionState
 
 Review comment:
   Can you please add javadoc for this class describing what this is and what 
guarantees it provides ... e.g. something like if a  `CompactionTask` is run 
with parameters matching here then row distribution in segments created would 
be exactly same.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [incubator-druid] himanshug commented on a change in pull request #8573: Stateful auto compaction

2019-09-25 Thread GitBox
himanshug commented on a change in pull request #8573: Stateful auto compaction
URL: https://github.com/apache/incubator-druid/pull/8573#discussion_r328373459
 
 

 ##
 File path: 
server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorSegmentCompactor.java
 ##
 @@ -178,12 +178,11 @@ private CoordinatorStats doRun(
   final List segmentsToCompact = iterator.next();
   final String dataSourceName = segmentsToCompact.get(0).getDataSource();
 
-  if (segmentsToCompact.size() > 1) {
+  if (!segmentsToCompact.isEmpty()) {
 
 Review comment:
   Line#179 should move in this if block.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org